[Paper] Incisor: Ex Ante Cloud Instance Selection for HPC Jobs

Published: 1 day ago (April 27, 2026 at 09:33 AM EDT)

5 min read

Source: arXiv

Source: arXiv - 2604.24464v1

Overview

The paper introduces Incisor, a system that automatically picks the right cloud VM type for high‑performance computing (HPC) jobs before they run. Traditionally, users have to manually match their code to a suitable instance—a time‑consuming, expertise‑heavy process. Incisor leverages program analysis together with large language models (LLMs) to infer hardware needs from just the executable, inputs, and command line, achieving fully automated, high‑quality instance selection on AWS.

Key Contributions

End‑to‑end ex‑ante instance selection: A complete pipeline that decides the optimal EC2 instance at submission time, using only the job’s artifacts (binary/script, inputs, command).
LLM‑guided hardware inference: Novel integration of state‑of‑the‑art coding LLMs to translate static analysis results into concrete hardware constraints (e.g., CPU count, memory, GPU, network bandwidth).
Zero‑shot success on diverse workloads: Works out‑of‑the‑box for compiled C/C++/Fortran programs and Python scripts, achieving 100 % first‑run success on a benchmark suite.
Performance and cost gains: Compared with a strong baseline (expert‑crafted constraints + SkyPilot), Incisor reduces job runtime by 54 % and cloud spend by 44 %.
Open‑source prototype: The authors release the Incisor code and evaluation scripts, enabling reproducibility and community extensions.

Methodology

Artifact Collection – When a user submits a job, Incisor gathers the executable (or script), its input data, and the exact command line. No prior profiling or historical runs are required.
Static Program Analysis – Using widely available tools (e.g., objdump, readelf, pyright), Incisor extracts:
- Instruction set architecture (x86‑64, ARM)
- Required libraries and their versions
- Memory allocation patterns (e.g., large buffers, MPI calls)
- Parallelism hints (OpenMP, MPI, CUDA kernels)
LLM Reasoning Layer – The extracted facts are fed to a frontier coding LLM (e.g., GPT‑4‑Turbo). Prompt engineering asks the model to map these facts to concrete cloud resource specifications:
- Number of vCPUs, RAM size, presence of GPUs, network bandwidth, storage type, etc.
- Preference for instance families (e.g., c6i, p4d, r5n) based on cost‑performance trade‑offs.
Instance Ranking & Selection – Incisor queries the AWS pricing/availability API, scores candidate instances against the LLM‑produced constraints, and picks the cheapest instance that satisfies all requirements.
Job Dispatch – The selected instance type is passed to the underlying scheduler (e.g., SkyPilot), which provisions the VM, transfers the artifacts, and launches the job.

The whole flow runs in seconds, making it practical for interactive HPC portals.

Results & Findings

Metric	Baseline (SkyPilot + expert constraints)	Incisor
First‑run success rate	78 % (some jobs failed due to mismatched resources)	100 %
Average runtime reduction	–	‑54 %
Average instance cost reduction	–	‑44 %
Time to select instance	Manual (minutes‑hours)	< 5 seconds (automated)

Robustness across languages: Handled 30 C, 20 C++, 15 Fortran, and 25 Python workloads without language‑specific tuning.
Cost‑performance balance: In many cases the LLM suggested a newer, slightly more expensive instance family that delivered enough speed‑up to offset the higher per‑hour price, resulting in net cost savings.
Scalability: Simulated 1,000 concurrent submissions; the selection service remained under 200 ms per request, showing that the approach scales to large HPC portals.

Practical Implications

Developer productivity: Data scientists and engineers can submit jobs without deep knowledge of cloud instance catalogs, freeing them to focus on algorithmic work.
Cloud cost optimization: Automated, workload‑aware selection trims spend for both startups and large research institutions that run many short‑lived HPC tasks.
Platform integration: Existing HPC‑as‑a‑service platforms (e.g., AWS Batch, Azure CycleCloud) can embed Incisor as a plug‑in to improve default instance choices.
Rapid adoption of new hardware: As cloud providers roll out new instance types (e.g., Graviton‑3, newer GPUs), Incisor’s LLM reasoning can instantly incorporate them without manual rule updates.
Reduced failure rates: By guaranteeing that required libraries, instruction sets, and accelerators are present, the system cuts the “instance‑mismatch” errors that often waste developer time.

Limitations & Future Work

LLM reliability: The approach depends on the LLM’s correctness; occasional hallucinations could suggest infeasible resources. The authors mitigate this with post‑validation but acknowledge residual risk.
Vendor lock‑in: The current prototype targets AWS EC2; extending to multi‑cloud or on‑premise clusters would require additional adapters and pricing models.
Dynamic workloads: Jobs whose resource needs evolve at runtime (e.g., adaptive mesh refinement) are not fully captured by static analysis alone. Future work may combine lightweight profiling or reinforcement‑learning feedback loops.
Security & privacy: Shipping code snippets to an LLM (even a self‑hosted one) raises concerns for proprietary workloads; the authors plan to explore on‑prem LLM deployments and privacy‑preserving prompting.

Overall, Incisor demonstrates that coupling classic program analysis with modern LLM reasoning can automate a traditionally manual, error‑prone step in cloud HPC workflows, delivering tangible speed and cost benefits for developers and organizations alike.

Authors

Michael A. Laurenzano
Shihan Cheng
David A. B. Hyde

Paper Information

arXiv ID: 2604.24464v1
Categories: cs.DC
Published: April 27, 2026
PDF: Download PDF

[Paper] Incisor: Ex Ante Cloud Instance Selection for HPC Jobs

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Pythia: Toward Predictability-Driven Agent-Native LLM Serving

[Paper] SpecFed: Accelerating Federated LLM Inference with Speculative Decoding and Compressed Transmission

[Paper] Two Efficient Message-passing Exclusive Scan Algorithms

[Paper] Volitional Multiagent Atomic Transactions: Describing People and their Machines