[Paper] Text-to-SQL as Dual-State Reasoning: Integrating Adaptive Context and Progressive Generation

Published: 2 months ago (November 26, 2025 at 08:52 AM EST)

4 min read

Source: arXiv

Source: arXiv - 2511.21402v1

Overview

The paper presents DSR‑SQL, a new “dual‑state” reasoning framework that tackles Text‑to‑SQL generation for large, real‑world databases. By separating the problem into an adaptive context that trims and clarifies the schema, and a progressive generation loop that iteratively builds and self‑corrects the SQL query, the authors achieve strong results on challenging benchmarks without any extra fine‑tuning or hand‑crafted prompts.

Key Contributions

Dual‑State Reasoning – Introduces two interacting states (context and generation) that jointly guide the model, a departure from single‑pass or pure chain‑of‑thought approaches.
Adaptive Context Construction – Automatically refines massive schemas into a compact, semantically faithful representation, reducing context overload and improving schema linking.
Feedback‑Guided Progressive Generation – Models SQL synthesis as a series of state transitions where the model can inspect its own partial output, receive execution feedback, and revise the query on‑the‑fly.
Zero‑Shot Performance – Achieves 35.28 % execution accuracy on Spider 2.0‑Snow and 68.32 % on the BIRD dev set without any post‑training, in‑context examples, or external tools.
Open‑Source Release – Provides a ready‑to‑run implementation (GitHub) for the community to reproduce and extend.

Methodology

Adaptive Context State
- Starts from the full database schema (tables, columns, foreign‑key relations).
- A lightweight selector (trained on schema‑linking signals) prunes irrelevant tables/columns based on the natural‑language question.
- The remaining “context” is encoded into a concise prompt that fits within LLM context windows while preserving key semantic relationships.
Progressive Generation State
- The LLM receives the adaptive context and begins generating a partial SQL statement.
- After each generation step, the partial query is executed against a sandboxed DB instance.
- Execution results (e.g., error messages, row counts) are fed back as a new “state” token, prompting the model to adjust the next fragment.
- This loop continues until the model produces a syntactically correct, semantically aligned query that yields the expected result.
Dual‑State Interaction
- The two states exchange information: the context can be refreshed if the generation reveals missing schema elements, and the generation can request additional context cues.
- The whole process is orchestrated by a simple controller that tracks state transitions, requiring no extra training data beyond the standard Text‑to‑SQL corpora.

Results & Findings

Benchmark	Execution Accuracy
Spider 2.0‑Snow (zero‑shot)	35.28 %
BIRD dev set (zero‑shot)	68.32 %

These numbers are competitive with methods that rely on heavy fine‑tuning or large prompt libraries.
Ablation studies show that removing either the adaptive context or the feedback loop drops performance by ~10–15 %, confirming that both states are essential.
Error analysis indicates that most remaining failures stem from ambiguous natural‑language questions rather than schema‑linking or syntax errors.

Practical Implications

Enterprise Data Access – Developers can embed DSR‑SQL into BI tools or chat‑ops assistants, letting non‑technical users ask complex questions over massive schemas without hitting LLM context limits.
Reduced Engineering Overhead – Because the approach works zero‑shot, teams don’t need to maintain costly fine‑tuned models for each new database; a single LLM (e.g., GPT‑4‑Turbo) can be reused across projects.
Self‑Correcting Pipelines – The feedback‑guided generation can be wrapped into automated ETL validation steps, catching malformed queries before they hit production databases.
Extensibility – The open‑source codebase makes it straightforward to plug in custom schema selectors, domain‑specific execution monitors, or even integrate with LLMs hosted on‑prem for privacy‑sensitive environments.

Limitations & Future Work

Scalability of Execution Feedback – Running partial queries after every generation step can be costly for very large tables; smarter caching or static analysis could mitigate this.
Ambiguity Handling – The current controller assumes a single correct answer; future work could incorporate clarification dialogs to resolve ambiguous user intents.
Domain‑Specific Semantics – While the adaptive context captures schema structure, deeper business logic (e.g., fiscal calendars, custom functions) still requires manual extensions.
Benchmark Diversity – The paper evaluates on Spider 2.0‑Snow and BIRD; testing on more industry‑specific datasets (e.g., healthcare, finance) would further validate real‑world robustness.

DSR‑SQL shows that a disciplined, two‑state reasoning loop can bridge the gap between powerful LLMs and the practical constraints of enterprise databases, opening the door for more reliable, zero‑shot Text‑to‑SQL assistants.

Authors

Zhifeng Hao
Qibin Song
Ruichu Cai
Boyan Xu

Paper Information

arXiv ID: 2511.21402v1
Categories: cs.CL
Published: November 26, 2025
PDF: Download PDF

[Paper] Text-to-SQL as Dual-State Reasoning: Integrating Adaptive Context and Progressive Generation

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] ThetaEvolve: Test-time Learning on Open Problems

[Paper] MegaChat: A Synthetic Persian Q&A Dataset for High-Quality Sales Chatbot Evaluation

[Paper] Ambiguity Awareness Optimization: Towards Semantic Disambiguation for Direct Preference Optimization

[Paper] Is Passive Expertise-Based Personalization Enough? A Case Study in AI-Assisted Test-Taking

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] ThetaEvolve: Test-time Learning on Open Problems

[Paper] MegaChat: A Synthetic Persian Q&amp;A Dataset for High-Quality Sales Chatbot Evaluation

[Paper] Ambiguity Awareness Optimization: Towards Semantic Disambiguation for Direct Preference Optimization

[Paper] Is Passive Expertise-Based Personalization Enough? A Case Study in AI-Assisted Test-Taking

[Paper] MegaChat: A Synthetic Persian Q&A Dataset for High-Quality Sales Chatbot Evaluation