[Paper] Text-to-SQL as Dual-State Reasoning: Integrating Adaptive Context and Progressive Generation
Source: arXiv - 2511.21402v1
Overview
The paper presents DSR‑SQL, a new “dual‑state” reasoning framework that tackles Text‑to‑SQL generation for large, real‑world databases. By separating the problem into an adaptive context that trims and clarifies the schema, and a progressive generation loop that iteratively builds and self‑corrects the SQL query, the authors achieve strong results on challenging benchmarks without any extra fine‑tuning or hand‑crafted prompts.
Key Contributions
- Dual‑State Reasoning – Introduces two interacting states (context and generation) that jointly guide the model, a departure from single‑pass or pure chain‑of‑thought approaches.
- Adaptive Context Construction – Automatically refines massive schemas into a compact, semantically faithful representation, reducing context overload and improving schema linking.
- Feedback‑Guided Progressive Generation – Models SQL synthesis as a series of state transitions where the model can inspect its own partial output, receive execution feedback, and revise the query on‑the‑fly.
- Zero‑Shot Performance – Achieves 35.28 % execution accuracy on Spider 2.0‑Snow and 68.32 % on the BIRD dev set without any post‑training, in‑context examples, or external tools.
- Open‑Source Release – Provides a ready‑to‑run implementation (GitHub) for the community to reproduce and extend.
Methodology
-
Adaptive Context State
- Starts from the full database schema (tables, columns, foreign‑key relations).
- A lightweight selector (trained on schema‑linking signals) prunes irrelevant tables/columns based on the natural‑language question.
- The remaining “context” is encoded into a concise prompt that fits within LLM context windows while preserving key semantic relationships.
-
Progressive Generation State
- The LLM receives the adaptive context and begins generating a partial SQL statement.
- After each generation step, the partial query is executed against a sandboxed DB instance.
- Execution results (e.g., error messages, row counts) are fed back as a new “state” token, prompting the model to adjust the next fragment.
- This loop continues until the model produces a syntactically correct, semantically aligned query that yields the expected result.
-
Dual‑State Interaction
- The two states exchange information: the context can be refreshed if the generation reveals missing schema elements, and the generation can request additional context cues.
- The whole process is orchestrated by a simple controller that tracks state transitions, requiring no extra training data beyond the standard Text‑to‑SQL corpora.
Results & Findings
| Benchmark | Execution Accuracy |
|---|---|
| Spider 2.0‑Snow (zero‑shot) | 35.28 % |
| BIRD dev set (zero‑shot) | 68.32 % |
- These numbers are competitive with methods that rely on heavy fine‑tuning or large prompt libraries.
- Ablation studies show that removing either the adaptive context or the feedback loop drops performance by ~10–15 %, confirming that both states are essential.
- Error analysis indicates that most remaining failures stem from ambiguous natural‑language questions rather than schema‑linking or syntax errors.
Practical Implications
- Enterprise Data Access – Developers can embed DSR‑SQL into BI tools or chat‑ops assistants, letting non‑technical users ask complex questions over massive schemas without hitting LLM context limits.
- Reduced Engineering Overhead – Because the approach works zero‑shot, teams don’t need to maintain costly fine‑tuned models for each new database; a single LLM (e.g., GPT‑4‑Turbo) can be reused across projects.
- Self‑Correcting Pipelines – The feedback‑guided generation can be wrapped into automated ETL validation steps, catching malformed queries before they hit production databases.
- Extensibility – The open‑source codebase makes it straightforward to plug in custom schema selectors, domain‑specific execution monitors, or even integrate with LLMs hosted on‑prem for privacy‑sensitive environments.
Limitations & Future Work
- Scalability of Execution Feedback – Running partial queries after every generation step can be costly for very large tables; smarter caching or static analysis could mitigate this.
- Ambiguity Handling – The current controller assumes a single correct answer; future work could incorporate clarification dialogs to resolve ambiguous user intents.
- Domain‑Specific Semantics – While the adaptive context captures schema structure, deeper business logic (e.g., fiscal calendars, custom functions) still requires manual extensions.
- Benchmark Diversity – The paper evaluates on Spider 2.0‑Snow and BIRD; testing on more industry‑specific datasets (e.g., healthcare, finance) would further validate real‑world robustness.
DSR‑SQL shows that a disciplined, two‑state reasoning loop can bridge the gap between powerful LLMs and the practical constraints of enterprise databases, opening the door for more reliable, zero‑shot Text‑to‑SQL assistants.
Authors
- Zhifeng Hao
- Qibin Song
- Ruichu Cai
- Boyan Xu
Paper Information
- arXiv ID: 2511.21402v1
- Categories: cs.CL
- Published: November 26, 2025
- PDF: Download PDF