EUNO.NEWS EUNO.NEWS
  • All (20349) +286
  • AI (3104) +14
  • DevOps (907) +7
  • Software (10509) +190
  • IT (5781) +75
  • Education (48)
  • Notice
  • All (20349) +286
    • AI (3104) +14
    • DevOps (907) +7
    • Software (10509) +190
    • IT (5781) +75
    • Education (48)
  • Notice
  • All (20349) +286
  • AI (3104) +14
  • DevOps (907) +7
  • Software (10509) +190
  • IT (5781) +75
  • Education (48)
  • Notice
Sources Tags Search
한국어 English 中文
  • 1个月前 · devops

    [Paper] 动态 PD-Disaggregation 架构用于最大化 LLM 推理服务中的 Goodput

    为了满足严格的服务水平目标(SLO),当代大型语言模型(LLMs)将预填充(prefill)和解码(decoding)阶段解耦,并将它们放置在不同的 GPU 上……

    #LLM inference #dynamic scaling #GPU orchestration #goodput optimization #serving architecture
EUNO.NEWS
RSS GitHub © 2026