Ship Your Product Documentation Into Customer's Chat Client

Published: 1 month ago (March 16, 2026 at 06:38 AM EDT)

4 min read

Source: Dev.to

Source: Dev.to

Fight Hallucination

Modern LLM chat clients search the web. That works for general knowledge, but it’s a problem for your product because internal documentation isn’t public. Web search returns whatever ranks highest, not necessarily the authoritative version your customer is using.

Result: customers ask their AI assistant about your product and receive plausible‑sounding wrong answers.
Fix: give the AI access to your actual docs with correct info at query time.

Why Not RAG?

RAG (Retrieval‑Augmented Generation) requires either:

The customer to build and maintain a retrieval pipeline, or
The vendor to host one—a vector DB, embedding model, and retrieval API running 24/7 for a doc corpus that changes only a few times a year.

Either way, the infrastructure cost is disproportionate to the problem.

Design Decision: Bundle the Docs Into the MCP Server

MCP (Model Context Protocol) is a standard for connecting tools and data sources to AI chat clients.

Decision: compile the documentation directly into the MCP Server.

No external database.
No embedding pipeline.

The customer connects to the server, and the AI immediately has access to accurate, current product documentation—sourced from the actual manual, not the web. Indexing work is done once at release time by the vendor.

The approach consists of two parts: a preprocessing step (run once at release) and runtime tools exposed through the MCP server.

Preprocessing — Understand and Organize the Documentation

Before a release, process the entire documentation set and generate an index that captures the document structure.
The full documentation and this index are compiled into the MCP server.

Runtime — Tools that Let the LLM Find and Retrieve Content

The MCP server exposes tools that give the chat client access to the index.
When a customer asks a question, the LLM calls these tools to:
1. Browse the index and identify the relevant section.
2. Retrieve the full, intact section (not a fragment) directly from the binary.

No external calls, no database. The LLM can match the question to the right section and generate an answer based on authoritative content.

End‑to‑End Flow

When a customer asks “How do I create a user in ABC product?” the interaction looks like this:

Customer          Claude            MCP Server         Docs (binary)
   |                 |                   |                   |
   |  "How do I      |                   |                   |
   |  create a user?"|                   |                   |
   |---------------->|                   |                   |
   |                 |  get_index()      |                   |
   |                 |------------------>|                   |
   |                 |                   |-- read index ---->|
   |                 |                   |                   |
   |                 |                   |-- read section -->|
   |                 |                   |  Users            |
   |<----------------|                   |                   |

At no point does Claude guess or search the web. Every step is grounded in the documentation compiled into the MCP server.

Cost Comparison

Aspect	RAG	Bundled Docs in MCP
Infrastructure	Vector DB + embedding model	None
Ongoing cost	Hosting + API calls	Zero
Customer setup	Build retrieval pipeline	Connect the server once
Retrieved content	Fragments, chunk boundaries	Whole sections, intact
Accuracy on domain‑specific queries	Depends on chunking strategy	High — intent‑matched index

Deployment

Package the binary as an MCPB file—a bundle that includes the server and a manifest—and ship it with your product release. The customer imports it into their chat client once. The server runs in STDIO mode as a local subprocess alongside the chat client. No cloud, no open ports, no configuration.

Takeaway

If your customers are using AI chat clients (and they are), they are already asking questions about your product. The choice is whether those answers come from your documentation or from whatever the web surfaces.

Bundling your docs into an MCP server makes the right answer the default answer, with no setup required from the customer and no ongoing infrastructure cost on your end.

Ship Your Product Documentation Into Customer's Chat Client

Fight Hallucination

Why Not RAG?

Design Decision: Bundle the Docs Into the MCP Server

How the Doc Search Works: Index Navigation, Not Vector Similarity

Preprocessing — Understand and Organize the Documentation

Runtime — Tools that Let the LLM Find and Retrieve Content

End‑to‑End Flow

Cost Comparison

Deployment

Takeaway

Related posts

Why Open Source AI Tools Are Quietly Winning

Travigo

Trust Debt: The Production Crisis Hidden Inside AI-Generated Codebases

Micro games