[Paper] Clairvoyant: Predictive SJF Scheduling to Mitigate Head-of-Line Blocking in Serial LLM Backends
Source: arXiv - 2606.07248v1
Overview
Serial LLM inference backends — such as Ollama — process requests one at a time under FCFS admission, causing Head-of-Line Blocking (HOLB) under mixed workloads at high utilisation: short factual queries can be delayed by minutes behind long generation jobs. While cloud-scale deployments mitigate HOLB via continuous batching (vLLM, Orca), these solutions require tens of GB of VRAM for concurrent KV-caches — infeasible for memory-constrained edge and local deployments that rely on serial request dispatch. We present \clairvoyant, a drop-in sidecar proxy for any serial OpenAI-compatible backend (e.g., Ollama, llama.cpp). \clairvoyant predicts response length from 19 lightweight lexical features via an ONNX-exported XGBoost classifier, achieving 0.029,ms per-request latency (four orders of magnitude below typical generation time). Because admission scheduling depends on relative ordering rather than exact prediction, the system optimises ranking fidelity, achieving 62—96% in-distribution and 52—66% cross-distribution accuracy across natural conversation datasets. We find that curated instruction datasets are degenerate training sources for length prediction: GPT-imposed brevity constraints reduce Long-class representation to under 0.02% of examples, making natural conversation logs the only viable training source. End-to-end GPU benchmarks on an RTX~4090 show 70—76% P50 latency reduction for short requests under maximum queue pressure (100 concurrent requests) and 17% under steady-state Poisson arrivals ($ρ=0.74$). \clairvoyant is open-source and requires no modifications to the inference backend.
Key Contributions
This paper presents research in the following areas:
- cs.DC
Methodology
Please refer to the full paper for detailed methodology.
Practical Implications
This research contributes to the advancement of cs.DC.
Authors
- Aravind Sundaresan
Paper Information
- arXiv ID: 2606.07248v1
- Categories: cs.DC
- Published: June 5, 2026
- PDF: Download PDF