Building a PostgreSQL Parser in Go: What Broke After We Open-Sourced It

Published: (February 18, 2026 at 01:18 AM EST)
3 min read
Source: Dev.to

Source: Dev.to

Cover image for Building a PostgreSQL Parser in Go: What Broke After We Open-Sourced It

postgresparser is a pure‑Go PostgreSQL SQL parser. It turns SQL text into structured metadata (tables, columns, joins, filters, DDL actions, parameters) without executing queries.

We thought it was solid. Open‑source proved we were wrong.
Here’s what we learned.

The biggest shift: use‑case expansion

We built the parser for our own workflow. Public users showed up with very different workloads. In the first week after release, most feedback centered on deterministic batch parsing.

Internal assumptions broke immediately

Inside a single team, ambiguous behavior survives because everyone “knows” the rules. Public users do not have that context.

The first pressure point was multi‑statement SQL. We had ParseSQL (single statement) and assumed batch parsing was “close enough.” It was not.

People were using the parser for:

  • CI linting pipelines
  • Production tools
  • LLM wrappers

Practical questions we could not answer cleanly:

  • Which exact statement failed?
  • Is this a warning or a hard failure?
  • Can I map diagnostics to the original SQL text reliably?

Those questions forced us to define strict contracts instead of relying on implied behavior. If your tool consumes SQL in bulk, batch correlation is everything.

Broken behavior example

SELECT 1;
SELECT FROM;
SELECT 2;

Early batch behavior made correlation awkward because results were compacted and diagnostics were not statement‑first. For CI checks or migration tooling, “something in the batch failed” is not actionable.

Now each statement has deterministic correlation (Index, RawSQL, Query, Warnings), so downstream code can point to the exact source statement.

Before/after API diff

- type ParseBatchResult struct {
-   Queries          []*ParsedQuery
-   Warnings         []ParseWarning
-   TotalStatements  int
-   ParsedStatements int
- }
+ type StatementParseResult struct {
+   Index    int
+   RawSQL   string
+   Query    *ParsedQuery   // nil => IR conversion failure
+   Warnings []ParseWarning // statement‑scoped warnings
+ }
+
+ type ParseBatchResult struct {
+   Statements       []StatementParseResult
+   TotalStatements  int
+   ParsedStatements int
+   HasFailures      bool
+ }

That shape is less convenient for quick demos, but much better for real integration.

Real SQL in the wild is uglier than test fixtures

Open‑source usage brought SQL shapes we did not have in internal tests:

  • trailing semicolons and odd whitespace
  • invalid syntax in the middle of an otherwise valid batch
  • mixed DDL + DML scripts
  • ONLY variants in DDL paths

The parser had to become resilient without becoming vague. That meant:

  • better statement‑level warning attribution
  • explicit failure semantics (Query == nil)
  • tighter handling across DDL relation extraction paths

Concrete snippet (current behavior)

batch, err := postgresparser.ParseSQLAll(sql)
if err != nil {
    log.Fatal(err)
}

fmt.Printf("total=%d parsed=%d has_failures=%t\n",
    batch.TotalStatements, batch.ParsedStatements, batch.HasFailures)

for _, stmt := range batch.Statements {
    fmt.Printf("idx=%d failed=%t warnings=%d raw=%q\n",
        stmt.Index, stmt.Query == nil, len(stmt.Warnings), stmt.RawSQL)
}

That is the integration model people asked for: deterministic, inspectable, and boring in the best way.

Why this matters

Open‑source removed our ability to hand‑wave edge cases. The development loop became:

  1. Implement
  2. Get challenged
  3. Simplify
  4. Lock behavior with tests
  5. Document the contract

That loop made postgresparser better than it would have been as an internal‑only tool. Internal tools can survive ambiguity; public libraries cannot.

If you’re building something on top of postgresparser, open an issue. Real‑world SQL keeps improving the contract.

0 views
Back to Blog

Related posts

Read more »