[Paper] LLM for Large-Scale Optimization Model Auto-Formulation: A Lightweight Few-Shot Learning Approach
Source: arXiv - 2601.09635v1
Overview
The paper presents LEAN‑LLM‑OPT, a lightweight, few‑shot framework that lets large language models (LLMs) automatically translate a natural‑language problem description and its data into a full‑blown large‑scale optimization model. By orchestrating a small team of LLM “agents” that first draft a step‑by‑step workflow and then execute it, the system dramatically cuts the manual effort traditionally required to build optimization formulations for complex business decisions.
Key Contributions
- LEAN‑LLM‑OPT workflow engine – a two‑stage agent architecture (upstream workflow designer + downstream model generator) that separates planning from data‑heavy execution.
- Few‑shot prompting recipe – demonstrates that even modest LLMs (e.g., open‑source 20 B model) can achieve strong results when guided by concise examples and a structured workflow.
- Two new benchmarks – Large‑Scale‑OR and Air‑NRM, the first publicly released suites for evaluating automatic formulation of large‑scale operations‑research problems.
- Real‑world validation – a case study on Singapore Airlines’ choice‑based revenue management problem, where LEAN‑LLM‑OPT matches or outperforms specialist‑built models.
- Open‑source release – code, data, and prompts are made available, enabling reproducibility and rapid adoption by practitioners.
Methodology
- Input – a textual description of the decision problem (e.g., “allocate seats to fare classes to maximize revenue”) plus the relevant datasets (historical bookings, capacity limits, etc.).
- Upstream agents – two LLMs collaborate to design a workflow: they retrieve similar past problems, outline the modeling steps (variable definition, constraints, objective, data preprocessing), and decide which steps can be automated with external tools (e.g., CSV parsers, statistical aggregators).
- Workflow representation – a structured list of sub‑tasks expressed in a simple DSL (Domain‑Specific Language) that the downstream agent can read.
- Downstream agent – a third LLM follows the workflow, generating the actual optimization code (typically in a modeling language like Pyomo or AMPL). Because the planning work is already done, this agent focuses on the “hard” parts: choosing the right decision variables, formulating non‑standard constraints, and embedding business logic that cannot be captured by generic templates.
- Few‑shot prompting – the system supplies a handful of annotated examples for each sub‑task, allowing the LLM to infer the pattern without exhaustive fine‑tuning.
- Execution & verification – the generated model is compiled, solved with a commercial or open‑source optimizer, and its solution quality is compared against a baseline built by human experts.
Results & Findings
| Setting | LLM used | Benchmark (Large‑Scale‑OR) | Revenue‑Management case (SG Air) |
|---|---|---|---|
| LEAN‑LLM‑OPT (GPT‑4.1) | GPT‑4.1 | 92 % of expert‑level objective value, 1.8× speedup vs. manual coding | Top‑3 performance across 5 demand scenarios, 4 % revenue lift over the incumbent system |
| LEAN‑LLM‑OPT (gpt‑oss‑20B) | Open‑source 20 B | 85 % of expert baseline, comparable to prior state‑of‑the‑art LLM pipelines | Competitive with proprietary solutions, achieving 2 % lift |
- The workflow‑first design reduces downstream token usage by ~30 %, cutting inference cost.
- Ablation studies show that removing the upstream planning agents drops solution quality by ~10 % and increases failure rates (syntax errors, missing constraints).
- Compared to a monolithic LLM prompting approach, LEAN‑LLM‑OPT attains higher consistency across diverse problem families (supply‑chain, scheduling, network design).
Practical Implications
- Rapid prototyping – Data scientists can describe a new optimization problem in plain English and obtain a runnable model within minutes, accelerating proof‑of‑concept cycles.
- Skill‑level levelling – Teams without deep OR expertise can still generate high‑quality formulations, democratizing access to advanced decision‑support tools.
- Cost efficiency – By leveraging few‑shot prompting rather than full fine‑tuning, organizations can reuse existing LLM APIs (including cheaper open‑source variants) without large GPU training budgets.
- Integration pipeline – The workflow DSL can be embedded into CI/CD pipelines, automatically updating models when data schemas evolve, thus supporting continuous‑optimization deployments.
- Domain‑specific extensions – The modular agent design makes it straightforward to plug in custom data‑preprocessing utilities (e.g., time‑series forecasting) or domain libraries (e.g., airline revenue‑management heuristics).
Limitations & Future Work
- Scalability of prompts – Very large problem descriptions may exceed token limits; future work could explore hierarchical chunking or retrieval‑augmented generation.
- Robustness to ambiguous specifications – The system still relies on relatively well‑structured natural‑language inputs; handling vague business language remains an open challenge.
- Solver dependency – Performance gains are tied to the underlying optimizer; integrating solver‑aware feedback loops could further improve model quality.
- Benchmark breadth – While Large‑Scale‑OR and Air‑NRM cover many classic OR domains, additional benchmarks (e.g., energy grid dispatch, logistics routing) would strengthen generalizability claims.
- Explainability – Translating the generated model back into human‑readable rationale is limited; future versions could output a “model‑explanation” report alongside the code.
Authors
- Kuo Liang
- Yuhang Lu
- Jianming Mao
- Shuyi Sun
- Chunwei Yang
- Congcong Zeng
- Xiao Jin
- Hanzhang Qin
- Ruihao Zhu
- Chung-Piaw Teo
Paper Information
- arXiv ID: 2601.09635v1
- Categories: cs.AI, cs.LG
- Published: January 14, 2026
- PDF: Download PDF