I Built an AI Pipeline for Books, Here's the Architecture
Source: Dev.to
Here’s What We Learned From 50 K Books
Most AI‑writing tools are just chat wrappers: paste a prompt, get text, copy it into Google Docs, repeat. For a full book that means hundreds of round‑trips and a total loss of context between them.
I’ve spent three years in the AI + publishing space—publishing books myself, building a reading platform (NanoReads, 130 + books, 341 K readers), and talking to hundreds of authors. The same complaints kept coming up:
- AI loses track of what happened ten chapters ago.
- Every chapter sounds different.
- Dialogue is flat.
- The output is full of “Moreover…”, “Furthermore…”, “It’s worth noting that…”.
These aren’t model‑quality problems. After generating 50 K+ books on our platform (AIWriteBook), we’re confident the bottleneck is the specification pipeline, not the language model.
The Architecture
We treat book creation as a multi‑stage compilation pipeline:
Book Metadata → Character Graph → Chapter Outlines → Chapter Content
| | | |
(schema) (schema) (schema) (streaming)
Each stage produces schema‑constrained structured output that feeds the next stage. Nothing is free‑form until the final prose generation.
Stage 1 – Book Metadata
The user supplies a title and a short description. The AI then generates a structured metadata object that becomes the single source of truth for everything downstream.
{
"title": "The Dragon's Reluctant Mate",
"genres": ["Fantasy", "Romance"],
"tone": ["dark", "romantic", "suspenseful"],
"style": ["dialogue‑heavy", "fast‑paced"],
"target_audience": "Adult fantasy romance readers",
"plot_techniques": ["enemies‑to‑lovers", "slow‑burn", "foreshadowing"],
"writing_style": "..."
}
Tone, style, and audience are constraints, not suggestions.
Stage 2 – Character Graph
Each character is a structured node containing voice, motivation, arc, and internal conflict. When generating a chapter we only pass the characters that actually appear, together with their current arc position and relationship dynamics.
{
"name": "Kira Ashvane",
"role": "protagonist",
"voice": "Sharp, clipped sentences. Uses sarcasm as defense.",
"motivation": "Prove she doesn't need the dragon clan's protection",
"internal_conflict": "Craves belonging but fears vulnerability",
"arc": "Isolation → reluctant alliance → trust → sacrifice"
}
Because the model receives explicit voice specs per character, dialogue no longer sounds homogeneous.
Stage 3 – Chapter Outlines
This is the most important stage. Every chapter gets a detailed spec that guides the downstream generation.
{
"chapter_number": 3,
"title": "The Binding Ceremony",
"events": [
"Kira is forced to attend the bonding ritual",
"..."
],
"locations": [
"Dragon temple, obsidian halls lit by bioluminescent moss"
],
"twists": [
"The ritual reveals Kira has dormant dragon magic"
],
"character_interactions": [
{
"characters": ["Kira", "Draethor"],
"dynamic": "hostile tension with undercurrent of curiosity"
}
],
"word_count": 2800
}
Internal A/B Test
| Metric | Default Outline | Customized Outline |
|---|---|---|
| Export rate | 16 % | 34 % |
| Satisfaction (out of 5) | 3.4 | 4.3 |
| Regenerations / chapter | 1.8 | 0.7 |
| Completion rate | 41 % | 72 % |
A mediocre model with a detailed outline beats a good model with a vague outline. As in software engineering, garbage in → garbage out.
Stage 4 – Chapter Generation
The only streaming stage. The model receives:
- Book metadata
- Relevant characters with voice specs
- The chapter’s outline
- Summaries of previous chapters (for continuity)
- Author’s writing‑style samples
We use a two‑model strategy:
- Gemini Flash – handles all structural work (fast, cheap, excels at schema‑constrained output).
- Frontier model – produces the final prose.
Voice Training
Authors can upload 3–5 writing samples. We extract style features and feed them as few‑shot examples during generation.
Results from our data:
- 2.4 × higher export rate with voice training.
- 41 % fewer regeneration requests.
- 67 % less manual editing.
Fewer than three samples → marginal improvement.
More than five samples → diminishing returns.
Without voice training the output feels like generic GPT; authors either abandon the project or spend hours rewriting. With voice training, the “AI slop” problem largely disappears because the model now has concrete anchors for style.
Fiction vs. Non‑Fiction Pipelines
Fiction
Uses the character graph + plot‑continuity pipeline described above.
Non‑Fiction
A separate architecture that starts from reference material.
Reference Files → Content Extraction → Book Structure Selection
|
Chapter Outlines (with assigned references)
|
Chapter Content (with citations)
Impact of Reference Material
| Condition | Export Rate | Satisfaction |
|---|---|---|
| With reference materials | +38 % | 4.4 / 5 |
| Without reference materials | baseline | 3.5 / 5 |
When the model has concrete data—named studies, real quotes, specific statistics—it produces far more trustworthy and satisfying nonfiction.
Takeaways
- Specification matters more than model size. A detailed, schema‑driven pipeline yields higher quality than simply scaling the LLM.
- Voice specs per character prevent flat dialogue.
- Chapter outlines are the single biggest lever for consistency, continuity, and author satisfaction.
- Few‑shot voice training dramatically reduces post‑generation editing.
- Non‑fiction requires a data‑centric pipeline that injects citations and reference material early.
Treating book generation like a compiler—metadata → graph → outline → stream—turns the chaotic “prompt‑and‑hope” workflow into a predictable, repeatable production line.
Things We Learned From 50K Books
Chapter length sweet spot is 2,000‑3,500 words.
- Below that, chapters feel underdeveloped.
- Above 3,500, the model starts repeating itself with different phrasing, introducing tangents, padding with unnecessary description.
- Above 5,000, quality drops hard. If a chapter needs to be long, splitting it works better than generating one long one.
Genre Matters a Lot
| Genre | Export Rate |
|---|---|
| Romance | 31 % |
| Literary fiction | 11 % |
| Humor | 13 % |
| Poetry | 9 % |
AI performs best with genres that have established conventions and abundant training data, and struggles with voice‑dependent, highly creative writing.
Only 23 % of generated books get exported for publishing.
The successful books share traits:
- 3.2× more time on outline editing
- Voice training enabled in 74 % of cases
- At least one manual edit in 89 % of chapters
Books that make it to publish are iterated on, not one‑click generated.
Multilingual Quality Varies
- Spanish, French, German are close to English quality.
- Polish, Russian, Japanese, Korean are good but noticeably lower.
- Smaller languages are usable for drafts.
Quality correlates with the volume of training data. For authors in less‑represented languages, generating in English and translating often yields better results than native generation.
Stack
- Frontend: Next.js, Tailwind, Supabase client
- Backend: Supabase Edge Functions (Deno)
- AI: Gemini Flash (structural), Frontier models (prose)
- Languages: 30+ supported
Wrapping Up
The main thing we took away from building this: the quality problem in AI‑generated books is a specification problem, not a model problem.
- Vague prompt + “generate” → slop.
- Detailed character graph, structured outline, voice samples, and proper constraints → genuinely good output.
If you want to try it, there’s a free tier that gives you a full 7‑chapter book.
Happy to answer questions about the architecture, the data, or anything about AI + publishing.
Tags: #ai #writing #books #showdev #webdev #productivity