How a pure-Python jq ended up 40x faster than the C bindings

Published: (June 11, 2026 at 02:13 AM EDT)
3 min read
Source: Dev.to

Source: Dev.to

I spent yesterday building purejq, a jq package on PyPI (the C

workload purejq jq PyPI (C bindings)

field-access stream 9 ms 368 ms

filter + count 55 ms 442 ms

map + aggregate 18 ms 444 ms

group_by 112 ms 704 ms

transform + sort 136 ms 899 ms

Pure Python, 7-40x faster than the C extension. That number looked wrong to tools/bench.py —verify), The C bindings wrap real jq, and real jq only speaks JSON. So every call your dicts -> JSON text -> C parser -> jq evaluates -> JSON text -> dicts

That round trip costs about 350-450 ms for 100k small objects on my purejq skips the trip entirely. It compiles the jq program once into Python import purejq

prog = purejq.compile(“group_by(.team) | map({team: .[0].team, n: length})”) prog.first(data) # operates on your objects, no serialization

The lesson generalizes beyond jq: when you embed a C library that has its in your language gets to skip the boundary, This one I really didn’t expect. End to end on a 93 MB file (1M objects),

workload purejq CLI jq 1.8.1 binary

single lookup 0.51 s 1.68 s

filter + count 1.08 s 1.96 s

group_by 2.32 s 3.89 s

No trick here either, just arithmetic: on large files, most of the wall json module parses To be fair to jq: on already-parsed streams in a shell pipeline, or small A few things mattered more than I expected: Compile once, run many. Programs become nested Python closures; evaluation never touches the AST again. Static binding. If a program never redefines select, the call is resolved at compile time instead of walking scopes at runtime. Single-output fast paths. Things like .score * 2 + 1 provably yield exactly one value, so they compile to plain function calls instead of generators. Object literals with constant keys skip the generator product entirely. Let C do the sorting. When sort keys are uniformly strings or numbers, sort_by/group_by/unique fall through to Python’s native sort instead of a comparison callback. That one change was worth 5x on sort-heavy workloads. PyPy for free. Pure Python means PyPy just works: another 2-9x on heavy workloads (map+aggregate drops from 18 ms to 2 ms). Claiming “it’s jq” is easy; the repo vendors jq’s official test suite and One more disclosure, since the commit history shows it anyway: I’m a —verify output. Make of that what you pip install purejq

Repo: https://github.com/adam2go/purejq — issues and PRs welcome, especially if you try it on Pyodide or PyPy.

0 views
Back to Blog

Related posts

Read more »

The spec is in the wrong place

My day job is at a large tech company. Hundreds of engineering teams, and every one of them is somewhere different on AI adoption. Some are still treating codin...

The Heuristics Say Don't

A culture that only records its disasters ends up with a biased archive. Wars documented, plagues chronicled, collapses catalogued. The quiet decades go unwritt...