· devops
[Paper] A Dynamic PD-Disaggregation Architecture for Maximizing Goodput in LLM Inference Serving
To meet strict Service-Level Objectives (SLOs),contemporary Large Language Models (LLMs) decouple the prefill and decoding stages and place them on separate GPU...