[Paper] LLM for Large-Scale Optimization Model Auto-Formulation: A Lightweight Few-Shot Learning Approach

Published: (January 14, 2026 at 12:09 PM EST)
4 min read
Source: arXiv

Source: arXiv - 2601.09635v1

Overview

The paper presents LEAN‑LLM‑OPT, a lightweight, few‑shot framework that lets large language models (LLMs) automatically translate a natural‑language problem description and its data into a full‑blown large‑scale optimization model. By orchestrating a small team of LLM “agents” that first draft a step‑by‑step workflow and then execute it, the system dramatically cuts the manual effort traditionally required to build optimization formulations for complex business decisions.

Key Contributions

  • LEAN‑LLM‑OPT workflow engine – a two‑stage agent architecture (upstream workflow designer + downstream model generator) that separates planning from data‑heavy execution.
  • Few‑shot prompting recipe – demonstrates that even modest LLMs (e.g., open‑source 20 B model) can achieve strong results when guided by concise examples and a structured workflow.
  • Two new benchmarksLarge‑Scale‑OR and Air‑NRM, the first publicly released suites for evaluating automatic formulation of large‑scale operations‑research problems.
  • Real‑world validation – a case study on Singapore Airlines’ choice‑based revenue management problem, where LEAN‑LLM‑OPT matches or outperforms specialist‑built models.
  • Open‑source release – code, data, and prompts are made available, enabling reproducibility and rapid adoption by practitioners.

Methodology

  1. Input – a textual description of the decision problem (e.g., “allocate seats to fare classes to maximize revenue”) plus the relevant datasets (historical bookings, capacity limits, etc.).
  2. Upstream agents – two LLMs collaborate to design a workflow: they retrieve similar past problems, outline the modeling steps (variable definition, constraints, objective, data preprocessing), and decide which steps can be automated with external tools (e.g., CSV parsers, statistical aggregators).
  3. Workflow representation – a structured list of sub‑tasks expressed in a simple DSL (Domain‑Specific Language) that the downstream agent can read.
  4. Downstream agent – a third LLM follows the workflow, generating the actual optimization code (typically in a modeling language like Pyomo or AMPL). Because the planning work is already done, this agent focuses on the “hard” parts: choosing the right decision variables, formulating non‑standard constraints, and embedding business logic that cannot be captured by generic templates.
  5. Few‑shot prompting – the system supplies a handful of annotated examples for each sub‑task, allowing the LLM to infer the pattern without exhaustive fine‑tuning.
  6. Execution & verification – the generated model is compiled, solved with a commercial or open‑source optimizer, and its solution quality is compared against a baseline built by human experts.

Results & Findings

SettingLLM usedBenchmark (Large‑Scale‑OR)Revenue‑Management case (SG Air)
LEAN‑LLM‑OPT (GPT‑4.1)GPT‑4.192 % of expert‑level objective value, 1.8× speedup vs. manual codingTop‑3 performance across 5 demand scenarios, 4 % revenue lift over the incumbent system
LEAN‑LLM‑OPT (gpt‑oss‑20B)Open‑source 20 B85 % of expert baseline, comparable to prior state‑of‑the‑art LLM pipelinesCompetitive with proprietary solutions, achieving 2 % lift
  • The workflow‑first design reduces downstream token usage by ~30 %, cutting inference cost.
  • Ablation studies show that removing the upstream planning agents drops solution quality by ~10 % and increases failure rates (syntax errors, missing constraints).
  • Compared to a monolithic LLM prompting approach, LEAN‑LLM‑OPT attains higher consistency across diverse problem families (supply‑chain, scheduling, network design).

Practical Implications

  • Rapid prototyping – Data scientists can describe a new optimization problem in plain English and obtain a runnable model within minutes, accelerating proof‑of‑concept cycles.
  • Skill‑level levelling – Teams without deep OR expertise can still generate high‑quality formulations, democratizing access to advanced decision‑support tools.
  • Cost efficiency – By leveraging few‑shot prompting rather than full fine‑tuning, organizations can reuse existing LLM APIs (including cheaper open‑source variants) without large GPU training budgets.
  • Integration pipeline – The workflow DSL can be embedded into CI/CD pipelines, automatically updating models when data schemas evolve, thus supporting continuous‑optimization deployments.
  • Domain‑specific extensions – The modular agent design makes it straightforward to plug in custom data‑preprocessing utilities (e.g., time‑series forecasting) or domain libraries (e.g., airline revenue‑management heuristics).

Limitations & Future Work

  • Scalability of prompts – Very large problem descriptions may exceed token limits; future work could explore hierarchical chunking or retrieval‑augmented generation.
  • Robustness to ambiguous specifications – The system still relies on relatively well‑structured natural‑language inputs; handling vague business language remains an open challenge.
  • Solver dependency – Performance gains are tied to the underlying optimizer; integrating solver‑aware feedback loops could further improve model quality.
  • Benchmark breadth – While Large‑Scale‑OR and Air‑NRM cover many classic OR domains, additional benchmarks (e.g., energy grid dispatch, logistics routing) would strengthen generalizability claims.
  • Explainability – Translating the generated model back into human‑readable rationale is limited; future versions could output a “model‑explanation” report alongside the code.

Authors

  • Kuo Liang
  • Yuhang Lu
  • Jianming Mao
  • Shuyi Sun
  • Chunwei Yang
  • Congcong Zeng
  • Xiao Jin
  • Hanzhang Qin
  • Ruihao Zhu
  • Chung-Piaw Teo

Paper Information

  • arXiv ID: 2601.09635v1
  • Categories: cs.AI, cs.LG
  • Published: January 14, 2026
  • PDF: Download PDF
Back to Blog

Related posts

Read more »