Video Conferencing with Postgres

Published: (February 26, 2026 at 07:00 PM EST)
7 min read

Source: PlanetScale Blog

Yesterday on X, SpacetimeDB tweeted that they had done “the world’s first video call over a database” and, in their own way, invited anyone else to give it a try.

Credit to them – it’s a cool idea!

In short, they built a frontend that:

  1. Captures audio and video from the browser’s media APIs.
  2. Encodes them into compact frames (PCM‑16LE audio, JPEG video).
  3. Sends the frames to a database that acts as a real‑time message broker.
  4. Streams the frames back out to the other participant’s browser for playback.

The implementation is open‑sourced (SpaceChatDB on GitHub).
I decided to see what it would look like using PostgreSQL, the world’s most popular open‑source database, to host the world’s second video call over a database.

How it works

Architecture

  • Frontend – SvelteKit app that captures media.
  • Relay – Small Node.js WebSocket server (pg‑relay) that validates calls and writes frames to PostgreSQL.
  • Database – $5 PlanetScale PostgreSQL instance that stores frames and provides a logical replication stream.

Call flow (video)

  1. Capture – Browser captures a camera frame, encodes it as JPEG, and sends it as a binary WebSocket message to pg‑relay.

  2. Insertpg‑relay validates the call and runs:

    INSERT INTO video_frames (
        session_id, from_id, to_id, seq, width, height, jpeg
    ) VALUES ($1, $2, $3, $4, $5, $6, $7);
  3. WAL – PostgreSQL writes the row to the WAL (write‑ahead log).

  4. Replicationpg‑relay also runs a logical replication consumer on the same database. When the new row appears in the replication stream, it:

    • Checks the to_id column.
    • Forwards the raw JPEG bytes over WebSocket to the recipient.
  5. Playback – The recipient’s browser creates a Blob URL from the JPEG and renders it.

Audio follows the same pattern, using an audio_frames table.

Logical replication?

PostgreSQL’s logical replication gives us a reliable, ordered change stream:

  • INSERT, UPDATE, and DELETE events for every table in the publication are delivered in commit order.
  • No need to poll the table with SELECT statements fast enough to render 15 fps video.

Because the same mechanism pushes video frames, it can also push:

  • Chat messages.
  • User‑presence changes.
  • Call‑state transitions (e.g., a row DELETE when a user disconnects).

Video table schema

CREATE TABLE video_frames (
    id          BIGSERIAL   PRIMARY KEY,
    session_id  UUID        NOT NULL,
    from_id     TEXT        NOT NULL,
    to_id       TEXT        NOT NULL,
    seq         INT         NOT NULL,
    width       SMALLINT    NOT NULL,
    height      SMALLINT    NOT NULL,
    jpeg        BYTEA       NOT NULL,
    inserted_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

Nothing special about this table – just rows with a JPEG in a BYTEA column.
The egress is modest and well within what a database like PostgreSQL can handle.

Media → PostgreSQL pipeline

Capture side

  • Video – Browser draws camera frames to an off‑screen canvas, calls canvas.toBlob() to get a JPEG.
  • Audio – An AudioWorkletNode collects PCM samples, resamples to 16 kHz mono, and encodes them as 16‑bit little‑endian integers.
  • Both payloads are packed into binary WebSocket frames with a small JSON header (session_id, seq, recipient) and sent to the relay.

Playback side

  • Video – Incoming JPEGs are turned into Blob URLs and set as the src of a <video> tag.
  • Audio – Samples are decoded back to floats and scheduled on an AudioBufferSourceNode with a small jitter buffer.

The whole system runs at 640 × 360 @ 15 fps with JPEG quality 0.65.
Each frame is roughly 25–40 KB, which translates to ≈ 375–600 KB/s of video per direction.

Accumulating rows

At 15 fps, a call would generate about 108 000 rows per hour.
To avoid unbounded growth, a cleanup job runs every 2 seconds and prunes frames older than 5 seconds:

DELETE FROM audio_frames
WHERE inserted_at = NOW() - INTERVAL '5 seconds'
GROUP BY from_id
ORDER BY frames_5s DESC;

Result (example):

from_idframes_5sapprox_fps
06cad97a128947a58e8ff754ec1171d4200c4d774b5c43c9b7637f11bba610367615.2
4f984feaee1042939383b0fffb3f1fc172d28aa92b654551a02c6187880219957615.2

Look at that – our $5 PostgreSQL is streaming bidirectional 15 fps video!

Persistence benefits

Because each JPEG is durably persisted:

  • The data is crash‑safe, replicable, and queryable later.
  • Hours of video can be stored for later analysis or replay.

I can even pull a single frame directly from the database and render it in the terminal (e.g., using catimg or an ANSI‑image viewer).

Could we have done this another way?

LISTEN/NOTIFY

PostgreSQL’s built‑in pub/sub (LISTEN/NOTIFY) seemed attractive:

  • No extra tables needed – just blast JPEG bytes through a notification channel.

Problem: Payload limit of 8 KB.
A 640 × 360 JPEG (25–40 KB) would need to be split into 4–5 notifications, requiring:

  • Chunking and reassembly logic.
  • Ordering guarantees.
  • Handling of dropped chunks.

That essentially recreates a TCP‑like protocol on top of NOTIFY, adding unnecessary complexity. Audio frames often fit under 8 KB, so a hybrid approach is possible, but mixing transport mechanisms defeats the simplicity I was after.

Unlogged tables

Another alternative is to use unlogged tables for inbound data:

  • Unlogged tables skip the WAL, eliminating fsync overhead.
  • They are faster for write‑heavy workloads but sacrifice durability (data is lost on crash).

In this demo, durability is a feature, so I kept the regular (logged) tables.

Summary

  • PostgreSQL’s logical replication provides a clean, ordered, and reliable change stream that can replace polling for real‑time media transport.
  • By treating the database as a message broker, we can build a fully functional video‑call system with minimal infrastructure (a small Node.js relay and a cheap PostgreSQL instance).
  • The approach also gives us persistent, queryable video frames for later analysis, something traditional media servers don’t offer out of the box.

Feel free to explore the source code and adapt the pattern to other real‑time use cases!

Crash Recovery

Inserts are faster because PostgreSQL isn’t making durability guarantees about video frames.

I didn’t like that because logical replication reads from the WAL. If the table doesn’t write to the WAL, it doesn’t appear in the replication stream. To make this work, we’d have to fall back to polling:

SELECT * FROM video_frames WHERE seq > $1;

…in a loop. This might have worked fine—maybe even better—but something about rendering video from a polling loop of SELECT * didn’t feel right.

How’d it go?
You be the judge. It exceeded my expectations.

Our $5 PlanetScale PostgreSQL was able to keep up with the insert rate of live video and audio, and browsers are optimized enough that they can take raw JPEG frames and turn them into video pretty convincingly.

Audio‑Video Sync Tweaks

The only adjustments I made after the first successful run were adding some boundaries to keep audio in sync. Video frames render instantly (we just swap the image), but audio needs to be buffered and scheduled ahead of time to avoid gaps. Getting them to stay in sync required clamping the audio scheduling buffer so it can’t drift too far ahead of real time:

const now = audioCtx.currentTime;
const clamped = nextPlayTime > now + 0.15 ? now + 0.02 : nextPlayTime;
const startAt = Math.max(clamped, now + 0.02);

Should You Do This?

No! Use WebRTC!

But if you want to understand how logical replication works and see how far you can push PostgreSQL as a general‑purpose real‑time backend, this is a fun way to find out. The entire relay server is about 400 lines of TypeScript.

My fork:

If only Alexander Graham Bell could see us now.

0 views
Back to Blog

Related posts

Read more »