Building a PostgreSQL Parser in Go: What Broke After We Open-Sourced It

Published: 2 months ago (February 18, 2026 at 01:18 AM EST)

3 min read

Source: Dev.to

Source: Dev.to

Cover image for Building a PostgreSQL Parser in Go: What Broke After We Open-Sourced It

postgresparser is a pure‑Go PostgreSQL SQL parser. It turns SQL text into structured metadata (tables, columns, joins, filters, DDL actions, parameters) without executing queries.

We thought it was solid. Open‑source proved we were wrong.
Here’s what we learned.

The biggest shift: use‑case expansion

We built the parser for our own workflow. Public users showed up with very different workloads. In the first week after release, most feedback centered on deterministic batch parsing.

Internal assumptions broke immediately

Inside a single team, ambiguous behavior survives because everyone “knows” the rules. Public users do not have that context.

The first pressure point was multi‑statement SQL. We had ParseSQL (single statement) and assumed batch parsing was “close enough.” It was not.

People were using the parser for:

CI linting pipelines
Production tools
LLM wrappers

Practical questions we could not answer cleanly:

Which exact statement failed?
Is this a warning or a hard failure?
Can I map diagnostics to the original SQL text reliably?

Those questions forced us to define strict contracts instead of relying on implied behavior. If your tool consumes SQL in bulk, batch correlation is everything.

Broken behavior example

SELECT 1;
SELECT FROM;
SELECT 2;

Early batch behavior made correlation awkward because results were compacted and diagnostics were not statement‑first. For CI checks or migration tooling, “something in the batch failed” is not actionable.

Now each statement has deterministic correlation (Index, RawSQL, Query, Warnings), so downstream code can point to the exact source statement.

Before/after API diff

- type ParseBatchResult struct {
-   Queries          []*ParsedQuery
-   Warnings         []ParseWarning
-   TotalStatements  int
-   ParsedStatements int
- }
+ type StatementParseResult struct {
+   Index    int
+   RawSQL   string
+   Query    *ParsedQuery   // nil => IR conversion failure
+   Warnings []ParseWarning // statement‑scoped warnings
+ }
+
+ type ParseBatchResult struct {
+   Statements       []StatementParseResult
+   TotalStatements  int
+   ParsedStatements int
+   HasFailures      bool
+ }

That shape is less convenient for quick demos, but much better for real integration.

Real SQL in the wild is uglier than test fixtures

Open‑source usage brought SQL shapes we did not have in internal tests:

trailing semicolons and odd whitespace
invalid syntax in the middle of an otherwise valid batch
mixed DDL + DML scripts
ONLY variants in DDL paths

The parser had to become resilient without becoming vague. That meant:

better statement‑level warning attribution
explicit failure semantics (Query == nil)
tighter handling across DDL relation extraction paths

Concrete snippet (current behavior)

batch, err := postgresparser.ParseSQLAll(sql)
if err != nil {
    log.Fatal(err)
}

fmt.Printf("total=%d parsed=%d has_failures=%t\n",
    batch.TotalStatements, batch.ParsedStatements, batch.HasFailures)

for _, stmt := range batch.Statements {
    fmt.Printf("idx=%d failed=%t warnings=%d raw=%q\n",
        stmt.Index, stmt.Query == nil, len(stmt.Warnings), stmt.RawSQL)
}

That is the integration model people asked for: deterministic, inspectable, and boring in the best way.

Why this matters

Open‑source removed our ability to hand‑wave edge cases. The development loop became:

Implement
Get challenged
Simplify
Lock behavior with tests
Document the contract

That loop made postgresparser better than it would have been as an internal‑only tool. Internal tools can survive ambiguity; public libraries cannot.

If you’re building something on top of postgresparser, open an issue. Real‑world SQL keeps improving the contract.

Building a PostgreSQL Parser in Go: What Broke After We Open-Sourced It

The biggest shift: use‑case expansion

Internal assumptions broke immediately

Broken behavior example

Before/after API diff

Real SQL in the wild is uglier than test fixtures

Concrete snippet (current behavior)

Why this matters

Related posts

Supabase: Why Modern AI Applications Are Choosing Postgres Again

Show HN: Pg-typesafe – Strongly typed queries for PostgreSQL and TypeScript

How We Documented 95 API Endpoints with OpenAPI and Scalar

🐜 Ant Colony Optimization for CVRP (Open Source)