[Paper] Text-to-SQL as Dual-State Reasoning: Integrating Adaptive Context and Progressive Generation

Published: (November 26, 2025 at 08:52 AM EST)
4 min read
Source: arXiv

Source: arXiv - 2511.21402v1

Overview

The paper presents DSR‑SQL, a new “dual‑state” reasoning framework that tackles Text‑to‑SQL generation for large, real‑world databases. By separating the problem into an adaptive context that trims and clarifies the schema, and a progressive generation loop that iteratively builds and self‑corrects the SQL query, the authors achieve strong results on challenging benchmarks without any extra fine‑tuning or hand‑crafted prompts.

Key Contributions

  • Dual‑State Reasoning – Introduces two interacting states (context and generation) that jointly guide the model, a departure from single‑pass or pure chain‑of‑thought approaches.
  • Adaptive Context Construction – Automatically refines massive schemas into a compact, semantically faithful representation, reducing context overload and improving schema linking.
  • Feedback‑Guided Progressive Generation – Models SQL synthesis as a series of state transitions where the model can inspect its own partial output, receive execution feedback, and revise the query on‑the‑fly.
  • Zero‑Shot Performance – Achieves 35.28 % execution accuracy on Spider 2.0‑Snow and 68.32 % on the BIRD dev set without any post‑training, in‑context examples, or external tools.
  • Open‑Source Release – Provides a ready‑to‑run implementation (GitHub) for the community to reproduce and extend.

Methodology

  1. Adaptive Context State

    • Starts from the full database schema (tables, columns, foreign‑key relations).
    • A lightweight selector (trained on schema‑linking signals) prunes irrelevant tables/columns based on the natural‑language question.
    • The remaining “context” is encoded into a concise prompt that fits within LLM context windows while preserving key semantic relationships.
  2. Progressive Generation State

    • The LLM receives the adaptive context and begins generating a partial SQL statement.
    • After each generation step, the partial query is executed against a sandboxed DB instance.
    • Execution results (e.g., error messages, row counts) are fed back as a new “state” token, prompting the model to adjust the next fragment.
    • This loop continues until the model produces a syntactically correct, semantically aligned query that yields the expected result.
  3. Dual‑State Interaction

    • The two states exchange information: the context can be refreshed if the generation reveals missing schema elements, and the generation can request additional context cues.
    • The whole process is orchestrated by a simple controller that tracks state transitions, requiring no extra training data beyond the standard Text‑to‑SQL corpora.

Results & Findings

BenchmarkExecution Accuracy
Spider 2.0‑Snow (zero‑shot)35.28 %
BIRD dev set (zero‑shot)68.32 %
  • These numbers are competitive with methods that rely on heavy fine‑tuning or large prompt libraries.
  • Ablation studies show that removing either the adaptive context or the feedback loop drops performance by ~10–15 %, confirming that both states are essential.
  • Error analysis indicates that most remaining failures stem from ambiguous natural‑language questions rather than schema‑linking or syntax errors.

Practical Implications

  • Enterprise Data Access – Developers can embed DSR‑SQL into BI tools or chat‑ops assistants, letting non‑technical users ask complex questions over massive schemas without hitting LLM context limits.
  • Reduced Engineering Overhead – Because the approach works zero‑shot, teams don’t need to maintain costly fine‑tuned models for each new database; a single LLM (e.g., GPT‑4‑Turbo) can be reused across projects.
  • Self‑Correcting Pipelines – The feedback‑guided generation can be wrapped into automated ETL validation steps, catching malformed queries before they hit production databases.
  • Extensibility – The open‑source codebase makes it straightforward to plug in custom schema selectors, domain‑specific execution monitors, or even integrate with LLMs hosted on‑prem for privacy‑sensitive environments.

Limitations & Future Work

  • Scalability of Execution Feedback – Running partial queries after every generation step can be costly for very large tables; smarter caching or static analysis could mitigate this.
  • Ambiguity Handling – The current controller assumes a single correct answer; future work could incorporate clarification dialogs to resolve ambiguous user intents.
  • Domain‑Specific Semantics – While the adaptive context captures schema structure, deeper business logic (e.g., fiscal calendars, custom functions) still requires manual extensions.
  • Benchmark Diversity – The paper evaluates on Spider 2.0‑Snow and BIRD; testing on more industry‑specific datasets (e.g., healthcare, finance) would further validate real‑world robustness.

DSR‑SQL shows that a disciplined, two‑state reasoning loop can bridge the gap between powerful LLMs and the practical constraints of enterprise databases, opening the door for more reliable, zero‑shot Text‑to‑SQL assistants.

Authors

  • Zhifeng Hao
  • Qibin Song
  • Ruichu Cai
  • Boyan Xu

Paper Information

  • arXiv ID: 2511.21402v1
  • Categories: cs.CL
  • Published: November 26, 2025
  • PDF: Download PDF
Back to Blog

Related posts

Read more »