EUNO.NEWS EUNO.NEWS
  • All (2544) +222
  • AI (576) +17
  • DevOps (149) +1
  • Software (1083) +148
  • IT (730) +55
  • Education (6) +1
  • Notice
  • All (2544) +222
    • AI (576) +17
    • DevOps (149) +1
    • Software (1083) +148
    • IT (730) +55
    • Education (6) +1
  • Notice
  • All (2544) +222
  • AI (576) +17
  • DevOps (149) +1
  • Software (1083) +148
  • IT (730) +55
  • Education (6) +1
  • Notice
Sources Tags Search
한국어 English 中文
  • 1 day ago · ai

    [Paper] AugServe: Adaptive Request Scheduling for Augmented Large Language Model Inference Serving

    As augmented large language models (LLMs) with external tools become increasingly popular in web applications, improving augmented LLM inference serving efficie...

    #LLM serving #adaptive scheduling #dynamic batching #inference optimization #augmented LLM
  • 1 week ago · ai

    [Paper] DSD: A Distributed Speculative Decoding Solution for Edge-Cloud Agile Large Model Serving

    Large language model (LLM) inference often suffers from high decoding latency and limited scalability across heterogeneous edge-cloud environments. Existing spe...

    #speculative decoding #LLM serving #edge‑cloud inference #distributed inference #adaptive window control
  • 1 week ago · ai

    [Paper] Aragog: Just-in-Time Model Routing for Scalable Serving of Agentic Workflows

    Agentic workflows have emerged as a powerful paradigm for solving complex, multi-stage tasks, but serving them at scale is computationally expensive given the m...

    #model routing #agentic workflows #LLM serving #scalable inference #cost optimization
EUNO.NEWS
RSS GitHub © 2025