[Paper] FuzzySQL: Uncovering Hidden Vulnerabilities in DBMS Special Features with LLM-Driven Fuzzing

Published: 3 days ago (February 22, 2026 at 11:20 PM EST)

5 min read

Source: arXiv

Source: arXiv - 2602.19490v1

Overview

FuzzySQL is a new fuzz‑testing framework that leverages large language models (LLMs) to generate and mutate SQL statements in ways that expose hidden bugs in the “special” features of modern database management systems (DBMS). By going beyond ordinary query syntax and targeting rarely‑used commands such as GTID, stored procedures, and process‑control statements, the authors demonstrate that many critical vulnerabilities remain invisible to traditional fuzzers.

Key Contributions

LLM‑driven SQL generation – combines a grammar‑based seed generator with a large language model that can reason about SQL semantics and produce realistic, feature‑rich statements.
Logic‑shifting progressive mutation – a novel mutation strategy that flips conditions, restructures control flow, and forces the DBMS down alternative execution paths.
Hybrid error‑repair pipeline – automatically fixes syntactic and context‑sensitive failures using a mix of rule‑based patches and LLM‑guided semantic repairs, keeping the fuzzing loop alive.
Cross‑DBMS evaluation – applied to MySQL, MariaDB, SQLite, PostgreSQL, and ClickHouse, uncovering 37 previously unknown vulnerabilities, including 7 tied to under‑tested special features.
Real‑world impact – 29 bugs have been confirmed, 9 assigned CVE IDs, and 14 patches already released by vendors.

Methodology

Grammar‑Guided Seed Creation – a traditional SQL grammar produces baseline queries covering common DML/DDL patterns.
LLM‑Assisted Enrichment – a large language model (e.g., GPT‑4) expands these seeds with rarely‑used constructs (e.g., SET GLOBAL GTID_PURGED, CALL PROCEDURE, KILL QUERY). The model is prompted to respect the target DBMS’s dialect.
Logic‑Shifting Progressive Mutation – instead of simple token swaps, the framework systematically negates predicates (WHERE x > 5 → WHERE NOT (x > 5)), swaps branches, and injects control‑flow statements, forcing the engine to explore alternative execution branches.
Hybrid Error Repair – when a generated query fails to parse or violates runtime constraints, a two‑stage repair kicks in:
- Rule‑based patcher fixes obvious syntax errors (missing commas, mismatched quotes).
- LLM‑semantic fixer rewrites the query to satisfy contextual requirements (e.g., adding missing privileges, correcting datatype mismatches).
Instrumentation & Crash Detection – the fuzzing harness monitors process exits, sanitizer alerts, and DBMS logs to capture crashes, memory corruptions, or security‑relevant error messages.

The loop repeats, continuously feeding successful mutations back into the LLM for further diversification.

Results & Findings

DBMS	Vulnerabilities Found	CVEs Assigned	Fixed by Vendor
MySQL	12	4	5
MariaDB	8	2	3
SQLite	5	1	2
PostgreSQL	7	1	2
ClickHouse	5	1	2

7 bugs were directly linked to “special features” (e.g., GTID mode, KILL, stored procedures) that conventional fuzzers rarely touch.
Crash types included segmentation faults, assertion failures, and privilege‑escalation paths where a low‑privileged query could affect global state.
Performance – the LLM‑augmented mutation pipeline generated ~2× more unique execution paths per hour compared to a baseline grammar‑only fuzzer.

These findings confirm that DBMSs still harbor deep, semantic‑level bugs despite years of hardening.

Practical Implications

Security‑first development – Teams building applications that rely on advanced DBMS features (replication, custom procedures, multi‑tenant isolation) should incorporate semantic fuzzing into their CI pipelines, not just syntactic SQL validation.
Vendor tooling – Database vendors can adopt the logic‑shifting mutation strategy to improve internal regression suites, especially for newly added system‑level commands.
LLM‑assisted testing – The hybrid repair pipeline demonstrates a practical way to keep LLMs in the loop without overwhelming developers with false positives; similar patterns can be applied to other complex configuration languages (e.g., Kubernetes YAML, Terraform).
Compliance & Auditing – Discovering hidden privilege‑escalation paths helps organizations meet standards like PCI‑DSS or ISO 27001, where undocumented DBMS capabilities must be accounted for.

In short, FuzzySQL shows that a modest investment in LLM‑driven fuzzing can surface high‑impact bugs that would otherwise stay hidden for years.

Limitations & Future Work

LLM dependence – The quality of generated queries hinges on the underlying model; smaller or older models may miss subtle dialect nuances.
Resource intensity – Running LLM inference at scale adds CPU/GPU overhead, which may be prohibitive for small teams.
Coverage gaps – While special features are better exercised, some DBMS‑specific extensions (e.g., custom plugins, user‑defined functions) remain out of scope.
Future directions suggested by the authors include: integrating reinforcement learning to prioritize mutations that historically trigger crashes, extending the framework to NoSQL stores, and open‑sourcing a lightweight “LLM‑lite” mutation engine for broader adoption.

Authors

Yongxin Chen
Zhiyuan Jiang
Chao Zhang
Haoran Xu
Shenglin Xu
Jianping Tang
Zheming Li
Peidai Xie
Yongjun Wang

Paper Information

arXiv ID: 2602.19490v1
Categories: cs.DB, cs.CR, cs.SE
Published: February 23, 2026
PDF: Download PDF

[Paper] FuzzySQL: Uncovering Hidden Vulnerabilities in DBMS Special Features with LLM-Driven Fuzzing

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Visual Milestone Planning in a Hybrid Development Context

[Paper] Detecting UX smells in Visual Studio Code using LLMs

[Paper] From Restructuring to Stabilization: A Large-Scale Experiment on Iterative Code Readability Refactoring with Large Language Models

[Paper] An Empirical Study of Bugs in Modern LLM Agent Frameworks