Part 4 — Retrieval Is the System

Published: (January 1, 2026 at 02:50 PM EST)
1 min read
Source: Dev.to

Source: Dev.to

Why Most Practical GenAI Systems Are Retrieval‑Centric

  • Large language models (LLMs) are trained on static data, which leads to:
    • Stale knowledge
    • Missing domain context
    • No source attribution
    • Inability to propagate corrections
  • For real‑world applications, relying solely on the model is unacceptable.
  • Accuracy, freshness, and traceability must be provided outside the model.

Retrieval‑Augmented Generation (RAG)

RAG works by shifting responsibility from the model to the system.

System responsibilities

  • Decide what information is relevant
  • Control what the model can see
  • Ground generation in known data

Model responsibilities

  • Synthesize the retrieved information
  • Generate natural‑language output

This separation is critical: most RAG failures stem from system issues, not from the model itself.

Common RAG Pitfalls

  • Poor chunk boundaries
  • Missing or incomplete metadata
  • Overly broad retrieval queries
  • Latency‑heavy pipelines

Because retrieval quality determines output quality long before the model is involved, addressing these issues is essential.

Benefits of a Retrieval‑Centric Architecture

  • Manageable context windows
  • Natural reduction of hallucinations
  • Interchangeable models (the same retrieval layer can feed different models)
  • Inspectable behavior (retrieved sources are visible)

At this point, GenAI systems resemble search systems with a generative layer on top—a desirable design.


The next post will examine cost, latency, and failure as design constraints rather than afterthoughts.

Back to Blog

Related posts

Read more »

2025: The Year in LLMs

Article URL: https://simonwillison.net/2025/Dec/31/the-year-in-llms/ Comments URL: https://news.ycombinator.com/item?id=46449643 Points: 56 Comments: 21...