Searchable JSON compression: page-level random access + ms lookups (and smaller than Zstd on our dataset)

Published: 3 days ago (February 19, 2026 at 02:12 PM EST)

3 min read

Source: Dev.to

Cover image for Searchable JSON compression: page-level random access + ms lookups (and smaller than Zstd on our dataset)

Why this matters: the hidden “decompress+parse tax”

If you store NDJSON as zstd, most queries still pay:

read large chunks
decompress everything
parse JSON
scan for the field/value you need

Even when the data size is modest, the CPU + I/O pattern becomes brutal at scale.

SEE targets workloads where you repeatedly need:

exists / pos / eq‑style queries
random access
low latency without full decompression

What SEE is (in 60 seconds)

SEE is a page‑based, schema‑aware format:

page‑level layout for random access
Bloom + skip to avoid touching irrelevant pages (high skip rate)
schema‑aware encoding (structure + deltas + dictionary where useful)

It is designed to reduce both:

data tax (storage/egress)
CPU tax (decompress/parse)

The trade‑off is that SEE optimizes for low I/O and low latency, not always the absolute smallest size (though it can win on size too, depending on the dataset).

KPI snapshot (public demo)

These are the numbers published from the demo pack:

Combined size ratio: ≈ 19.5 % of raw
Lookup latency (present): p50 ≈ 0.18 ms / p95 ≈ 0.28 ms / p99 ≈ 0.34 ms
Skip ratio: present ≈ 0.99 / absent ≈ 0.992
Bloom density: ≈ 0.30

“Combined” is the total footprint for the SEE artifact on the benchmarked dataset.

KPI chart

Proof‑first distribution (so you can verify without meetings)

I intentionally ship reproducible packs:

Demo ZIP (≈10 min)
- prebuilt wheel + sample .see artifacts
- demo scripts that print KPIs (ratio/skip/bloom/p50–p99)
- One‑pager PDF
DD Pack (audit / repro artifacts)
- run summaries + run_metrics.json
- verification checklist (pack_verify.txt)
- designed for technical diligence

Recent robustness milestone: strict decode‑mismatch checks across multiple datasets = 0 (decode_mismatch_count=0, decode_extended_mismatch_count=0, audit PASS).

Quick start (demo)

pip install see_proto
python samples/quick_demo.py

The script prints:

compression ratio
skip/bloom statistics
lookup latency (p50/p95/p99)

What I’m looking for

SEE is not a SaaS product. I’m exploring strategic acquisition or an exclusive license with teams that have a clear integration path.

To keep evaluation high‑signal, I run up to a small number of NDA evals per month. If you’re on a data platform / infra / storage team and can see where this fits, I’d love to hear from you.

Searchable JSON compression: page-level random access + ms lookups (and smaller than Zstd on our dataset)

Why this matters: the hidden “decompress+parse tax”

What SEE is (in 60 seconds)

KPI snapshot (public demo)

Proof‑first distribution (so you can verify without meetings)

Quick start (demo)

Links

What I’m looking for

Related posts

Apex B. OpenClaw, Local Embeddings.

Apex 1. OpenClaw, Historial de Providers.

I Built the Open Source “Microsoft Edge Drop” Replacement Using Cloudflare R2 + Turso

Did some actual coding today - found a blind spot example for coding agents