[Paper] Cluster Workload Allocation: Semantic Soft Affinity Using Natural Language Processing

Published: (January 14, 2026 at 03:36 AM EST)
4 min read
Source: arXiv

Source: arXiv - 2601.09282v1

Overview

The paper proposes a new way to schedule workloads on Kubernetes clusters by letting users express placement preferences in plain English. By plugging a Large Language Model (LLM) into the Kubernetes scheduler, the system translates natural‑language “hints” into soft‑affinity rules, dramatically lowering the expertise barrier for DevOps teams and developers who need to fine‑tune pod placement.

Key Contributions

  • Intent‑driven scheduling: Introduces a semantic “soft affinity” model where allocation hints are written in natural language rather than YAML‑encoded selectors.
  • LLM‑powered scheduler extender: Implements a Kubernetes scheduler extender that calls an LLM (via AWS Bedrock) to parse the hints and generate affinity/anti‑affinity constraints on the fly.
  • Cluster‑state cache: Adds a lightweight cache of node resources to keep the LLM calls stateless and fast enough for scheduling decisions.
  • Empirical evaluation: Shows >95 % subset‑accuracy in parsing intent across top‑tier LLMs (Amazon Nova Pro/Premier, Mistral Pixtral Large) and demonstrates placement quality that meets or exceeds hand‑crafted Kubernetes configurations in six test scenarios.
  • Open‑source prototype: Provides a reference implementation that can be dropped into existing clusters for experimentation.

Methodology

  1. Annotation Design – Developers annotate pod specs with a free‑form intent field (e.g., “run this service on nodes with at least 8 GiB RAM and avoid nodes that host database pods”).
  2. Scheduler Extender Hook – When the default scheduler reaches the “filter” stage, the extender forwards the intent string and a snapshot of the current cluster state to an LLM endpoint.
  3. Intent Analyzer – The LLM parses the natural language, extracts constraints (CPU, memory, node labels, co‑location preferences, etc.), and returns a structured JSON that the extender translates into Kubernetes nodeAffinity, podAffinity, and podAntiAffinity objects.
  4. Cache Layer – To avoid pulling the full cluster state on every request, a lightweight in‑memory cache is kept up‑to‑date via the Kubernetes watch API, ensuring the LLM sees a recent view of node capacities.
  5. Evaluation Setup
    • Parsing Accuracy: A ground‑truth dataset of 500 intent statements was created; subset accuracy (all constraints correct) was measured for each LLM.
    • Placement Quality: Six realistic workload mixes (CPU‑heavy, memory‑heavy, mixed, conflicting soft preferences, etc.) were run on a 12‑node test cluster. The resulting pod distributions were compared against manually tuned affinity rules and a baseline heuristic parser.

Results & Findings

MetricTop LLM (Nova Pro)Baseline Parser
Subset Accuracy (parsing)96.3 %71.2 %
Average Scheduling Latency*210 ms (sync)45 ms
Placement Score (resource balance, affinity satisfaction)+12 % vs. manual config–8 % vs. manual config

*Latency measured from intent receipt to affinity object generation; the synchronous LLM call dominates this time.

  • Parsing: All tested LLMs correctly identified every constraint in the majority of cases; errors were mostly ambiguous phrasing.
  • Placement: In simple scenarios the prototype matched hand‑crafted rules; in complex or conflicting soft‑affinity cases it outperformed manual configurations by better balancing resource utilization and respecting user intent.
  • Conflict Resolution: The system gracefully de‑prioritized lower‑confidence constraints, yielding feasible placements even when user hints conflicted.

Practical Implications

  • Lower the learning curve – Ops teams can now express “run this on a fast node but not next to the cache layer” without mastering the full Kubernetes affinity syntax.
  • Rapid prototyping – Developers can iterate on placement strategies by editing a comment instead of redeploying YAML with intricate label selectors.
  • Cross‑team collaboration – Product managers or architects can convey high‑level placement policies in plain language that the system enforces automatically.
  • Potential for SaaS extensions – Cloud providers could expose “intent‑based scheduling” as a managed feature, letting customers fine‑tune cost vs. performance without deep cluster knowledge.
  • Integration path – The extender is a drop‑in component; existing clusters can adopt it incrementally, falling back to the default scheduler for pods that omit the intent annotation.

Limitations & Future Work

  • Synchronous latency – Real‑time LLM calls add noticeable overhead; the authors suggest moving to an asynchronous queue where intents are pre‑processed and cached.
  • Model dependency – Accuracy hinges on the chosen LLM; newer models may improve parsing but also increase cost.
  • Security & compliance – Sending intent strings to a managed LLM service may raise data‑privacy concerns for regulated environments.
  • Scalability of the cache – In very large clusters (hundreds of nodes) the in‑memory snapshot could become a bottleneck; a distributed cache is a possible extension.
  • Broader intent scope – Future work could explore temporal constraints (“run this only during business hours”) or cost‑aware hints (“prefer spot instances”).

By demonstrating that LLMs can reliably translate human‑friendly scheduling hints into actionable Kubernetes policies, this research opens the door to more intuitive, intent‑driven cluster management—an attractive prospect for any organization looking to streamline DevOps workflows.

Authors

  • Leszek Sliwko
  • Jolanta Mizeria-Pietraszko

Paper Information

  • arXiv ID: 2601.09282v1
  • Categories: cs.AI, cs.DC, cs.LG, cs.SE
  • Published: January 14, 2026
  • PDF: Download PDF
Back to Blog

Related posts

Read more »