[Paper] FuzzySQL: Uncovering Hidden Vulnerabilities in DBMS Special Features with LLM-Driven Fuzzing
Source: arXiv - 2602.19490v1
Overview
FuzzySQL is a new fuzz‑testing framework that leverages large language models (LLMs) to generate and mutate SQL statements in ways that expose hidden bugs in the “special” features of modern database management systems (DBMS). By going beyond ordinary query syntax and targeting rarely‑used commands such as GTID, stored procedures, and process‑control statements, the authors demonstrate that many critical vulnerabilities remain invisible to traditional fuzzers.
Key Contributions
- LLM‑driven SQL generation – combines a grammar‑based seed generator with a large language model that can reason about SQL semantics and produce realistic, feature‑rich statements.
- Logic‑shifting progressive mutation – a novel mutation strategy that flips conditions, restructures control flow, and forces the DBMS down alternative execution paths.
- Hybrid error‑repair pipeline – automatically fixes syntactic and context‑sensitive failures using a mix of rule‑based patches and LLM‑guided semantic repairs, keeping the fuzzing loop alive.
- Cross‑DBMS evaluation – applied to MySQL, MariaDB, SQLite, PostgreSQL, and ClickHouse, uncovering 37 previously unknown vulnerabilities, including 7 tied to under‑tested special features.
- Real‑world impact – 29 bugs have been confirmed, 9 assigned CVE IDs, and 14 patches already released by vendors.
Methodology
- Grammar‑Guided Seed Creation – a traditional SQL grammar produces baseline queries covering common DML/DDL patterns.
- LLM‑Assisted Enrichment – a large language model (e.g., GPT‑4) expands these seeds with rarely‑used constructs (e.g.,
SET GLOBAL GTID_PURGED,CALL PROCEDURE,KILL QUERY). The model is prompted to respect the target DBMS’s dialect. - Logic‑Shifting Progressive Mutation – instead of simple token swaps, the framework systematically negates predicates (
WHERE x > 5→WHERE NOT (x > 5)), swaps branches, and injects control‑flow statements, forcing the engine to explore alternative execution branches. - Hybrid Error Repair – when a generated query fails to parse or violates runtime constraints, a two‑stage repair kicks in:
- Rule‑based patcher fixes obvious syntax errors (missing commas, mismatched quotes).
- LLM‑semantic fixer rewrites the query to satisfy contextual requirements (e.g., adding missing privileges, correcting datatype mismatches).
- Instrumentation & Crash Detection – the fuzzing harness monitors process exits, sanitizer alerts, and DBMS logs to capture crashes, memory corruptions, or security‑relevant error messages.
The loop repeats, continuously feeding successful mutations back into the LLM for further diversification.
Results & Findings
| DBMS | Vulnerabilities Found | CVEs Assigned | Fixed by Vendor |
|---|---|---|---|
| MySQL | 12 | 4 | 5 |
| MariaDB | 8 | 2 | 3 |
| SQLite | 5 | 1 | 2 |
| PostgreSQL | 7 | 1 | 2 |
| ClickHouse | 5 | 1 | 2 |
- 7 bugs were directly linked to “special features” (e.g., GTID mode,
KILL, stored procedures) that conventional fuzzers rarely touch. - Crash types included segmentation faults, assertion failures, and privilege‑escalation paths where a low‑privileged query could affect global state.
- Performance – the LLM‑augmented mutation pipeline generated ~2× more unique execution paths per hour compared to a baseline grammar‑only fuzzer.
These findings confirm that DBMSs still harbor deep, semantic‑level bugs despite years of hardening.
Practical Implications
- Security‑first development – Teams building applications that rely on advanced DBMS features (replication, custom procedures, multi‑tenant isolation) should incorporate semantic fuzzing into their CI pipelines, not just syntactic SQL validation.
- Vendor tooling – Database vendors can adopt the logic‑shifting mutation strategy to improve internal regression suites, especially for newly added system‑level commands.
- LLM‑assisted testing – The hybrid repair pipeline demonstrates a practical way to keep LLMs in the loop without overwhelming developers with false positives; similar patterns can be applied to other complex configuration languages (e.g., Kubernetes YAML, Terraform).
- Compliance & Auditing – Discovering hidden privilege‑escalation paths helps organizations meet standards like PCI‑DSS or ISO 27001, where undocumented DBMS capabilities must be accounted for.
In short, FuzzySQL shows that a modest investment in LLM‑driven fuzzing can surface high‑impact bugs that would otherwise stay hidden for years.
Limitations & Future Work
- LLM dependence – The quality of generated queries hinges on the underlying model; smaller or older models may miss subtle dialect nuances.
- Resource intensity – Running LLM inference at scale adds CPU/GPU overhead, which may be prohibitive for small teams.
- Coverage gaps – While special features are better exercised, some DBMS‑specific extensions (e.g., custom plugins, user‑defined functions) remain out of scope.
- Future directions suggested by the authors include: integrating reinforcement learning to prioritize mutations that historically trigger crashes, extending the framework to NoSQL stores, and open‑sourcing a lightweight “LLM‑lite” mutation engine for broader adoption.
Authors
- Yongxin Chen
- Zhiyuan Jiang
- Chao Zhang
- Haoran Xu
- Shenglin Xu
- Jianping Tang
- Zheming Li
- Peidai Xie
- Yongjun Wang
Paper Information
- arXiv ID: 2602.19490v1
- Categories: cs.DB, cs.CR, cs.SE
- Published: February 23, 2026
- PDF: Download PDF