[Paper] LLM for Large-Scale Optimization Model Auto-Formulation: A Lightweight Few-Shot Learning Approach

Published: 3 weeks ago (January 14, 2026 at 12:09 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2601.09635v1

Overview

The paper presents LEAN‑LLM‑OPT, a lightweight, few‑shot framework that lets large language models (LLMs) automatically translate a natural‑language problem description and its data into a full‑blown large‑scale optimization model. By orchestrating a small team of LLM “agents” that first draft a step‑by‑step workflow and then execute it, the system dramatically cuts the manual effort traditionally required to build optimization formulations for complex business decisions.

Key Contributions

LEAN‑LLM‑OPT workflow engine – a two‑stage agent architecture (upstream workflow designer + downstream model generator) that separates planning from data‑heavy execution.
Few‑shot prompting recipe – demonstrates that even modest LLMs (e.g., open‑source 20 B model) can achieve strong results when guided by concise examples and a structured workflow.
Two new benchmarks – Large‑Scale‑OR and Air‑NRM, the first publicly released suites for evaluating automatic formulation of large‑scale operations‑research problems.
Real‑world validation – a case study on Singapore Airlines’ choice‑based revenue management problem, where LEAN‑LLM‑OPT matches or outperforms specialist‑built models.
Open‑source release – code, data, and prompts are made available, enabling reproducibility and rapid adoption by practitioners.

Methodology

Input – a textual description of the decision problem (e.g., “allocate seats to fare classes to maximize revenue”) plus the relevant datasets (historical bookings, capacity limits, etc.).
Upstream agents – two LLMs collaborate to design a workflow: they retrieve similar past problems, outline the modeling steps (variable definition, constraints, objective, data preprocessing), and decide which steps can be automated with external tools (e.g., CSV parsers, statistical aggregators).
Workflow representation – a structured list of sub‑tasks expressed in a simple DSL (Domain‑Specific Language) that the downstream agent can read.
Downstream agent – a third LLM follows the workflow, generating the actual optimization code (typically in a modeling language like Pyomo or AMPL). Because the planning work is already done, this agent focuses on the “hard” parts: choosing the right decision variables, formulating non‑standard constraints, and embedding business logic that cannot be captured by generic templates.
Few‑shot prompting – the system supplies a handful of annotated examples for each sub‑task, allowing the LLM to infer the pattern without exhaustive fine‑tuning.
Execution & verification – the generated model is compiled, solved with a commercial or open‑source optimizer, and its solution quality is compared against a baseline built by human experts.

Results & Findings

Setting	LLM used	Benchmark (Large‑Scale‑OR)	Revenue‑Management case (SG Air)
LEAN‑LLM‑OPT (GPT‑4.1)	GPT‑4.1	92 % of expert‑level objective value, 1.8× speedup vs. manual coding	Top‑3 performance across 5 demand scenarios, 4 % revenue lift over the incumbent system
LEAN‑LLM‑OPT (gpt‑oss‑20B)	Open‑source 20 B	85 % of expert baseline, comparable to prior state‑of‑the‑art LLM pipelines	Competitive with proprietary solutions, achieving 2 % lift

The workflow‑first design reduces downstream token usage by ~30 %, cutting inference cost.
Ablation studies show that removing the upstream planning agents drops solution quality by ~10 % and increases failure rates (syntax errors, missing constraints).
Compared to a monolithic LLM prompting approach, LEAN‑LLM‑OPT attains higher consistency across diverse problem families (supply‑chain, scheduling, network design).

Practical Implications

Rapid prototyping – Data scientists can describe a new optimization problem in plain English and obtain a runnable model within minutes, accelerating proof‑of‑concept cycles.
Skill‑level levelling – Teams without deep OR expertise can still generate high‑quality formulations, democratizing access to advanced decision‑support tools.
Cost efficiency – By leveraging few‑shot prompting rather than full fine‑tuning, organizations can reuse existing LLM APIs (including cheaper open‑source variants) without large GPU training budgets.
Integration pipeline – The workflow DSL can be embedded into CI/CD pipelines, automatically updating models when data schemas evolve, thus supporting continuous‑optimization deployments.
Domain‑specific extensions – The modular agent design makes it straightforward to plug in custom data‑preprocessing utilities (e.g., time‑series forecasting) or domain libraries (e.g., airline revenue‑management heuristics).

Limitations & Future Work

Scalability of prompts – Very large problem descriptions may exceed token limits; future work could explore hierarchical chunking or retrieval‑augmented generation.
Robustness to ambiguous specifications – The system still relies on relatively well‑structured natural‑language inputs; handling vague business language remains an open challenge.
Solver dependency – Performance gains are tied to the underlying optimizer; integrating solver‑aware feedback loops could further improve model quality.
Benchmark breadth – While Large‑Scale‑OR and Air‑NRM cover many classic OR domains, additional benchmarks (e.g., energy grid dispatch, logistics routing) would strengthen generalizability claims.
Explainability – Translating the generated model back into human‑readable rationale is limited; future versions could output a “model‑explanation” report alongside the code.

Authors

Kuo Liang
Yuhang Lu
Jianming Mao
Shuyi Sun
Chunwei Yang
Congcong Zeng
Xiao Jin
Hanzhang Qin
Ruihao Zhu
Chung-Piaw Teo

Paper Information

arXiv ID: 2601.09635v1
Categories: cs.AI, cs.LG
Published: January 14, 2026
PDF: Download PDF

[Paper] LLM for Large-Scale Optimization Model Auto-Formulation: A Lightweight Few-Shot Learning Approach

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Do explanations generalize across large reasoning models?

[Paper] Building Production-Ready Probes For Gemini

[Paper] ShapeR: Robust Conditional 3D Shape Generation from Casual Captures

[Paper] MetaboNet: The Largest Publicly Available Consolidated Dataset for Type 1 Diabetes Management