dynamic batching

1 month ago · ai

[Paper] AugServe: Adaptive Request Scheduling for Augmented Large Language Model Inference Serving

As augmented large language models (LLMs) with external tools become increasingly popular in web applications, improving augmented LLM inference serving efficie...

#LLM serving #adaptive scheduling #dynamic batching #inference optimization #augmented LLM