Behind the scenes at Perl School Publishing

Published: 4 days ago (December 14, 2025 at 12:46 PM EST)

6 min read

Source: Dev.to

We’ve just published a new Perl School book: Design Patterns in Modern Perl by Mohammad Sajid Anwar.

It’s been a while since we last released a new title, and in the meantime the world of eBooks has moved on – Amazon no longer uses .mobi, tools have changed, and my old “it mostly works if you squint” build pipeline was starting to creak.

On top of that we had a hard deadline: we wanted the book ready in time for the London Perl Workshop. As the date loomed, last‑minute fixes and manual tweaks became more and more terrifying. We really needed a reliable, reproducible way to go from manuscript to “good quality PDF + EPUB” every time.

So over the last couple of weeks I’ve been rebuilding the Perl School book pipeline from the ground up. This post is the story of that process, the tools I ended up using, and how you can steal it for your own books.

The old world, and why it wasn’t good enough

The original Perl School pipeline dates back to a very different era:

Amazon wanted .mobi files.
EPUB support was patchy.
I was happy to glue things together with shell scripts and hope for the best.

It worked… until it didn’t. Each book had slightly different scripts, slightly different assumptions, and a slightly different set of last‑minute manual tweaks. It certainly wasn’t something I’d hand to a new author and say, “trust this”.

Coming back to it for Design Patterns in Modern Perl made that painfully obvious. The book itself is modern and well‑structured; the pipeline that produced it shouldn’t feel like a relic.

Choosing tools: Pandoc and `wkhtmltopdf` (and no LaTeX, thanks)

The new pipeline is built around two main tools:

Pandoc – the Swiss Army knife of document conversion. It can take Markdown/Markua plus metadata and produce HTML, EPUB, and much, much more.
wkhtmltopdf – which turns HTML into a print‑ready PDF using a headless browser engine.

Why not LaTeX? Because I’m allergic. LaTeX is enormously powerful, but every time I’ve tried to use it seriously I end up debugging page breaks in a language I don’t enjoy. HTML + CSS I can live with; browsers I can reason about.

Conversion flow

PDF route

Markdown → HTML (via Pandoc) → PDF (via wkhtmltopdf)

EPUB route

Markdown → EPUB (via Pandoc) → validated with epubcheck

The front matter (cover page, title page, copyright, etc.) is generated with Template Toolkit from a simple book-metadata.yml file, then stitched together with the chapters to produce a nice, consistent book.

That got us a long way… but then a reader found a bug.

The iBooks bug report

Shortly after publication, a reader who bought the Leanpub EPUB and was reading it in Apple Books (iBooks) saw a big pink error box:

There’s something wrong with the XHTML in this EPUB.

Apple Books is quite strict about the “X” in XHTML: it expects well‑formed XML, not just “kind of valid HTML”. When working with EPUB you need to forget the HTML5 flexibility you’ve grown used to.

Discovering `epubcheck`

epubcheck is the reference validator for EPUB files. Point it at an .epub and it will unpack it, parse all the XML/XHTML, check the metadata and manifest, and tell you exactly what’s wrong.

Running it on the book immediately produced:

Fatal Error while parsing file: The element type `br` must be terminated by the matching end-tag `</br>`.

In HTML   is fine; in XHTML (which is XML) you must use   (self‑closing) or  . A number of these appeared across a few chapters. Pandoc had passed raw HTML straight through into the EPUB, but that HTML was not strictly valid XHTML, so Apple Books rejected it.

A quick (but not scalable) fix

Under time pressure the quickest way to confirm the diagnosis was:

Unzip the generated EPUB.
Open the offending XHTML file.
Manually change   to   in a couple of places.
Re‑zip the EPUB.
Run epubcheck again.
Try it in Apple Books.

The errors vanished, epubcheck was happy, and the reader confirmed the fixed file opened fine. However, “open the EPUB in a text editor and fix the XHTML by hand” is not a sustainable publishing strategy.

HTML vs XHTML, and why linters matter

The underlying issue is straightforward:

HTML is very forgiving; browsers will fix broken markup.
XHTML is XML, so it’s not forgiving. EPUB 3 content files are XHTML; sloppy HTML will cause some readers (like Apple Books) to refuse to load the chapter.

I added a manuscript HTML linter to the toolchain, before we ever get to Pandoc or epubcheck.

Roughly, the linter:

Reads the manuscript (ignoring fenced code blocks so it doesn’t complain about < in Perl examples).
Extracts any raw HTML chunks.
Wraps those chunks in a temporary root element.
Uses XML::LibXML to check they’re well‑formed XML.
Reports any errors with file and line number.

It’s not a full HTML validator; it simply asks, “If this HTML ends up in an EPUB, will the XML parser choke?” This would have caught the   problem before the book ever left my machine.

Hardening the pipeline: `epubcheck` in the loop

The linter catches obvious issues in the manuscript; epubcheck remains the final authority on the finished EPUB.

The pipeline now looks like this:

Lint the manuscript HTML – catch broken raw HTML/XHTML before conversion.
Build PDF + EPUB via make_book.
Run epubcheck on the EPUB – ensure the final file is standards‑compliant.
Only then upload to Leanpub and Amazon.

Any future changes (new CSS, new template, different metadata) still go through the same gauntlet, and the pipeline shouts at me long before a reader has to.

Docker and GitHub Actions: making it reproducible

Having a nice Perl script and a list of tools installed on my laptop is fine for a solo project; it’s not great if:

other authors might want to build their own drafts, or
I want the build to happen automatically in CI.

The next step was to package everything into a Docker image and wire it into GitHub Actions.

Docker image contents

Perl + cpanm + all CPAN modules from the repo’s cpanfile
pandoc
wkhtmltopdf
Java + epubcheck
The Perl School utility scripts themselves (make_book, check_ms_html, etc.)

Typical workflow in a book repo

# Mount the book’s Git repo into /work
docker run --rm -v "$(pwd)":/work perl-school-builder \
    perl check_ms_html   # lint the manuscript

docker run --rm -v "$(pwd)":/work perl-school-builder \
    perl make_book       # build built/*.pdf and built/*.epub

docker run --rm -v "$(pwd)":/work perl-school-builder \
    java -jar /usr/local/epubcheck/epubcheck.jar built/*.epub

With everything containerized and automated, any author can reproduce the exact same build, and the CI pipeline guarantees that only standards‑compliant PDFs and EPUBs are ever published.

Behind the scenes at Perl School Publishing

The old world, and why it wasn’t good enough

Choosing tools: Pandoc and `wkhtmltopdf` (and no LaTeX, thanks)

Conversion flow

The iBooks bug report

Discovering `epubcheck`

A quick (but not scalable) fix

HTML vs XHTML, and why linters matter

Hardening the pipeline: `epubcheck` in the loop

Docker and GitHub Actions: making it reproducible

Docker image contents

Typical workflow in a book repo

Related posts

We found our site was slow in Singapore but perfect in Europe — here's why

I put a Game Boy inside ChatGPT (ChatGPT Apps)

Advent of AI - Day 13: Goose Terminal Integration

A Day in the Life of a Marketing Manager Using Microsoft Planner

The old world, and why it wasn’t good enough

Choosing tools: Pandoc and wkhtmltopdf (and no LaTeX, thanks)

Conversion flow

The iBooks bug report

Discovering epubcheck

A quick (but not scalable) fix

HTML vs XHTML, and why linters matter

Hardening the pipeline: epubcheck in the loop

Docker and GitHub Actions: making it reproducible

Docker image contents

Typical workflow in a book repo

Related posts

We found our site was slow in Singapore but perfect in Europe — here's why

I put a Game Boy inside ChatGPT (ChatGPT Apps)

Advent of AI - Day 13: Goose Terminal Integration

A Day in the Life of a Marketing Manager Using Microsoft Planner

Choosing tools: Pandoc and `wkhtmltopdf` (and no LaTeX, thanks)

Discovering `epubcheck`

Hardening the pipeline: `epubcheck` in the loop