EUNO.NEWS EUNO.NEWS
  • All (2571) +248
  • AI (578) +19
  • DevOps (150) +2
  • Software (1091) +156
  • IT (746) +70
  • Education (6) +1
  • Notice
  • All (2571) +248
    • AI (578) +19
    • DevOps (150) +2
    • Software (1091) +156
    • IT (746) +70
    • Education (6) +1
  • Notice
  • All (2571) +248
  • AI (578) +19
  • DevOps (150) +2
  • Software (1091) +156
  • IT (746) +70
  • Education (6) +1
  • Notice
Sources Tags Search
한국어 English 中文
  • 1 week ago · devops

    [Paper] A Dynamic PD-Disaggregation Architecture for Maximizing Goodput in LLM Inference Serving

    To meet strict Service-Level Objectives (SLOs),contemporary Large Language Models (LLMs) decouple the prefill and decoding stages and place them on separate GPU...

    #LLM inference #dynamic scaling #GPU orchestration #goodput optimization #serving architecture
EUNO.NEWS
RSS GitHub © 2025