Ship Your Product Documentation Into Customer's Chat Client

Published: (March 16, 2026 at 06:38 AM EDT)
4 min read
Source: Dev.to

Source: Dev.to

Fight Hallucination

Modern LLM chat clients search the web. That works for general knowledge, but it’s a problem for your product because internal documentation isn’t public. Web search returns whatever ranks highest, not necessarily the authoritative version your customer is using.

Result: customers ask their AI assistant about your product and receive plausible‑sounding wrong answers.
Fix: give the AI access to your actual docs with correct info at query time.

Why Not RAG?

RAG (Retrieval‑Augmented Generation) requires either:

  • The customer to build and maintain a retrieval pipeline, or
  • The vendor to host one—a vector DB, embedding model, and retrieval API running 24/7 for a doc corpus that changes only a few times a year.

Either way, the infrastructure cost is disproportionate to the problem.

Design Decision: Bundle the Docs Into the MCP Server

MCP (Model Context Protocol) is a standard for connecting tools and data sources to AI chat clients.

Decision: compile the documentation directly into the MCP Server.

  • No external database.
  • No embedding pipeline.

The customer connects to the server, and the AI immediately has access to accurate, current product documentation—sourced from the actual manual, not the web. Indexing work is done once at release time by the vendor.

How the Doc Search Works: Index Navigation, Not Vector Similarity

The approach consists of two parts: a preprocessing step (run once at release) and runtime tools exposed through the MCP server.

Preprocessing — Understand and Organize the Documentation

  • Before a release, process the entire documentation set and generate an index that captures the document structure.
  • The full documentation and this index are compiled into the MCP server.

Runtime — Tools that Let the LLM Find and Retrieve Content

  • The MCP server exposes tools that give the chat client access to the index.
  • When a customer asks a question, the LLM calls these tools to:
    1. Browse the index and identify the relevant section.
    2. Retrieve the full, intact section (not a fragment) directly from the binary.

No external calls, no database. The LLM can match the question to the right section and generate an answer based on authoritative content.

End‑to‑End Flow

When a customer asks “How do I create a user in ABC product?” the interaction looks like this:

Customer          Claude            MCP Server         Docs (binary)
   |                 |                   |                   |
   |  "How do I      |                   |                   |
   |  create a user?"|                   |                   |
   |---------------->|                   |                   |
   |                 |  get_index()      |                   |
   |                 |------------------>|                   |
   |                 |                   |-- read index ---->|
   |                 |                   |                   |
   |                 |                   |-- read section -->|
   |                 |                   |  Users            |
   |<----------------|                   |                   |

At no point does Claude guess or search the web. Every step is grounded in the documentation compiled into the MCP server.

Cost Comparison

AspectRAGBundled Docs in MCP
InfrastructureVector DB + embedding modelNone
Ongoing costHosting + API callsZero
Customer setupBuild retrieval pipelineConnect the server once
Retrieved contentFragments, chunk boundariesWhole sections, intact
Accuracy on domain‑specific queriesDepends on chunking strategyHigh — intent‑matched index

Deployment

Package the binary as an MCPB file—a bundle that includes the server and a manifest—and ship it with your product release. The customer imports it into their chat client once. The server runs in STDIO mode as a local subprocess alongside the chat client. No cloud, no open ports, no configuration.

Takeaway

If your customers are using AI chat clients (and they are), they are already asking questions about your product. The choice is whether those answers come from your documentation or from whatever the web surfaces.

Bundling your docs into an MCP server makes the right answer the default answer, with no setup required from the customer and no ongoing infrastructure cost on your end.

0 views
Back to Blog

Related posts

Read more »

Travigo

Travel as fast as you speak with Gemini! Where live agents meet immersive storytelling & 3D navigation. This project was created for entering the Gemini Live Ag...

Micro games

Hey Gamers! 👾 As part of the Rapid Games Prototyping module, we are tasked with reviewing a peer's game. The challenge is to analyse a prototype built in just...