pg_ducklake: Columnar Storage for PostgreSQL

Published: (March 11, 2026 at 05:29 AM EDT)
3 min read
Source: Dev.to

Source: Dev.to

Cover image for pg_ducklake: Columnar Storage for PostgreSQL

Access pg_ducklake at the GitHub repository.

Introduction

PostgreSQL is the world’s most advanced open‑source relational database1. It boasts the broadest and most mature ecosystem in the modern data stack, from AI integrations and JDBC drivers to ORMs and robust tooling for monitoring, backups, and replication.

However, PostgreSQL is fundamentally a row‑store designed for transactions, which makes it less natural for large analytical scans and aggregations.

At the same time, the lakehouse approach—separating storage and compute and using open columnar file formats (often Parquet)—is becoming the default for analytics. Table formats like Delta Lake and Apache Iceberg have popularized this model, but they can be complex to manage due to metadata overhead.

DuckLake is a new participant that keeps SQL catalog metadata while storing data in open Parquet files. pg_ducklake sits at the intersection of these worlds, bringing a native lakehouse experience into PostgreSQL while remaining accessible from the DuckDB ecosystem.

Architecture

pg_ducklake creates a unified experience by bridging three components:

  • PostgreSQL (Interface & Catalog) – You manage and query tables using familiar PostgreSQL SQL, while all table metadata is stored natively in PostgreSQL heap tables.
  • DuckDB (Execution Engine) – A vectorized DuckDB engine is embedded directly within the PostgreSQL backend to handle analytical scans and aggregations efficiently.
  • DuckLake (Storage Format) – Data is stored as Parquet files on S3, with metadata kept in PostgreSQL. This allows external DuckDB clients (CLI, Python, etc.) to access the same tables.

Architecture diagram

What pg_ducklake Brings

The goal is simple: use PostgreSQL normally, but get lakehouse‑style tables when you need them.

  • Postgres‑native ergonomics – DuckLake tables are managed from PostgreSQL using familiar SQL and tooling, fitting naturally into Postgres apps, BI, and analyst workflows. In replica‑friendly deployments (e.g., serverless Postgres setups like Neon), you can scale read‑heavy analytics by adding read replicas.
  • Open tables by default – Parquet data + PostgreSQL catalog; DuckDB clients (CLI, Python, etc.) can read the “raw” DuckLake table by using PostgreSQL as the metadata provider.
  • DuckDB speed for analytics – Vectorized execution combined with columnar storage delivers fast scans and aggregations.

What’s Next for pg_ducklake

pg_ducklake is under active development, aiming for a production‑grade lakehouse experience inside PostgreSQL. The roadmap includes practical features such as:

  • Schema evolution
  • Time travel
  • Partitioning / layout controls
  • Table maintenance (compaction / garbage collection)
  • Clearer operational guidance as more real‑world users adopt the project

Feedback and contributions are very welcome, especially real‑world workloads, feature requests, and reports of any sharp edges you encounter.

pg_ducklake: Native lakehouse tables in PostgreSQL — Open data, DuckDB speed.

Footnotes

  1. Quoted from the PostgreSQL website.

0 views
Back to Blog

Related posts

Read more »

Travigo

Travel as fast as you speak with Gemini! Where live agents meet immersive storytelling & 3D navigation. This project was created for entering the Gemini Live Ag...

Micro games

Hey Gamers! 👾 As part of the Rapid Games Prototyping module, we are tasked with reviewing a peer's game. The challenge is to analyse a prototype built in just...