DjVu and its connection to Deep Learning (2023)

Published: 3 days ago (February 15, 2026 at 04:05 AM EST)

5 min read

Source: Hacker News

DjVu vs. PDF

DjVu is a vastly superior file format for books, mathematical papers, and just about anything else you can think of, compared to the original PDF (modern PDFs have adopted some of its innovations, but they’re often hidden behind layers of complexity). PDF is essentially PostScript with a bunch of metadata and layers. This works fine when the PDF is generated by LaTeX and intended for printing, but most of the world’s useful text still lives on paper that must be scanned to become digital.

DjVu excels at sharing compressed book scans, whereas PDF does not. When a large image scan is saved as a PDF, it usually consists of a series of photographic JPEG (or TIFF) images. JPEG is poor at representing text because its discrete cosine transform (DCT) is optimized for natural images, not sharp character edges. DjVu, on the other hand, assumes the data is a mix of text and images, allowing it to discard much of the redundant information. This is a good assumption: most users only need the text and plots, and DjVu captures those well. PDF typically stores everything in a scan as a bitmap (or as JPEGs), which is far less efficient.

Why JPEG sucks on text

The People Behind DjVu

Yann LeCun, Léon Bottou, and Yoshua Bengio were among the creators of DjVu, together with other contributors such as Patrick Haffner and Bill Riemers. All three are also fathers of deep learning (along with Geoffrey Hinton, who famously developed a fear of ballpoint pens).

LeCun and Bottou also created my favorite little programming language, Lush—the Lisp Universal Shell—where they did much of their pioneering work in the 1990s. While I understand why modern frameworks migrated from Lush to Lua and then to Python, Lush remains one of my all‑time favorite designs. It is as interesting in its own right as the K family, and because it never had to handle the mess of order‑book maintenance, it feels remarkably comfortable to use. When I retire I’ll probably revive it to power superior robot vacuum cleaners or something—its design is that good. There’s a lot of fascinating, almost “Leonardo‑notebook” material in its R&D history, from the Ogre UI to the code‑book editor.

Yann LeCun

Since deep‑learning models dominate current discussions, the earlier work of their creators is worth revisiting for historical perspective. In 1998 the Internet was still young, and PDFs were not yet reliable. We often downloaded LZ77/Huffman‑coded PostScript files to share scientific papers. Those files were awful: not because they required decompression, but because they were large (often four times the size of LaTeX‑generated PDFs) and the network speeds of the time made downloading them painfully slow.

DjVu solved an important problem at that time: it offered very good compression ratios and made scanned material efficiently shareable online, moving the Internet toward a true “super‑library” of both printed books and generated content. The main obstacle was that most operating systems lacked a DjVu reader, whereas Adobe had already ensured that everyone had a PDF viewer. Installing a DjVu reader was a hassle, and browsers of that era could display neither PDFs nor DjVu files natively.

Technical Highlights

Background images – DjVu uses an image format similar to JPEG 2000 for background layers (called IW44). JPEG 2000 employs wavelet compression, so even the first quarter of a file yields a decent low‑resolution image. This provides a natural way to perform lossy compression: simply drop the higher‑order wavelets. Wavelet data are further compressed with arithmetic coding, which is a clever and efficient technique.
Foreground text – DjVu employs a format called JB2 for text (and other symbol‑like elements). JB2 is related to the technique used in PDFs that was famously exploited by the Pegasus exploit. The algorithm clusters bitmap fragments roughly the size of characters, then groups geometrically similar ones, effectively creating a symbol dictionary that is agnostic to whether the symbols are actual letters. The document is then compressed by arithmetic coding of these symbols.
Arithmetic coding – DjVu’s arithmetic coder, the ZP‑coder, is an innovative system. It resembles simple run‑length coders (see Golomb coding) but is optimized for decoding speed by using probability tables. It would be a shame if the ZP‑coder were not more widely adopted; a universal version could enable efficient generation of fake documents based on a corpus, much like modern generative models but with far lower computational cost.

DjVu remains a powerful, under‑appreciated format for scanned documents, offering superior compression and flexibility compared to traditional PDFs.

DjVu: A Forgotten Superior Format

It’s a shame it didn’t catch on better, and there is probably an HBS case study for the full story of why the objectively superior tool failed in the market. It even failed in Internet Archive use, which it was also well suited for. DjVu still has utility in scanned documents and reading scanned documents. The main problem with it is the problem it had of old: lack of support. Black‑and‑white e‑book readers like the Kindle and the Kobo don’t support it natively despite it being just about the perfect format for scanned documents on a limited‑processor greyscale e‑book reader.

I personally use a Kobo‑Forma rooted with the excellent KOReader to get access to the many useful DjVu files I have (basically all my textbooks available on the road). It’s ridiculous that I have to hack a device to get access to physically portable DjVu files, but I suppose scanned books don’t make anybody money.

I’ve long held that most of the knowledge developed since the advent of the internauts is basically anti‑knowledge, meaning those scanned books in DjVu are potentially more valuable than all the PDFs in the universe. It would be nice to see it used by more mainstream publishers, but the lack of a DjVu target for things like LaTeX means it probably won’t be. I guess in the meantime DjVu is the most punk‑rock document format.

https://en.wikipedia.org/wiki/DjVu

DjVu and its connection to Deep Learning (2023)

DjVu vs. PDF

The People Behind DjVu

Technical Highlights

Related posts

The political effects of X's feed algorithm

Show HN: Strava for Claude Code

R3forth: A Concatenative Language Derived from ColorForth

R3forth: A concatenative language derived from ColorForth