[Paper] TokenScale: Timely and Accurate Autoscaling for Disaggregated LLM Serving with Token Velocity
The architectural shift to prefill/decode (PD) disaggregation in LLM serving improves resource utilization but struggles with the bursty nature of modern worklo...