Ship Your Product Documentation Into Customer's Chat Client
Source: Dev.to
Fight Hallucination
Modern LLM chat clients search the web. That works for general knowledge, but it’s a problem for your product because internal documentation isn’t public. Web search returns whatever ranks highest, not necessarily the authoritative version your customer is using.
Result: customers ask their AI assistant about your product and receive plausible‑sounding wrong answers.
Fix: give the AI access to your actual docs with correct info at query time.
Why Not RAG?
RAG (Retrieval‑Augmented Generation) requires either:
- The customer to build and maintain a retrieval pipeline, or
- The vendor to host one—a vector DB, embedding model, and retrieval API running 24/7 for a doc corpus that changes only a few times a year.
Either way, the infrastructure cost is disproportionate to the problem.
Design Decision: Bundle the Docs Into the MCP Server
MCP (Model Context Protocol) is a standard for connecting tools and data sources to AI chat clients.
Decision: compile the documentation directly into the MCP Server.
- No external database.
- No embedding pipeline.
The customer connects to the server, and the AI immediately has access to accurate, current product documentation—sourced from the actual manual, not the web. Indexing work is done once at release time by the vendor.
How the Doc Search Works: Index Navigation, Not Vector Similarity
The approach consists of two parts: a preprocessing step (run once at release) and runtime tools exposed through the MCP server.
Preprocessing — Understand and Organize the Documentation
- Before a release, process the entire documentation set and generate an index that captures the document structure.
- The full documentation and this index are compiled into the MCP server.
Runtime — Tools that Let the LLM Find and Retrieve Content
- The MCP server exposes tools that give the chat client access to the index.
- When a customer asks a question, the LLM calls these tools to:
- Browse the index and identify the relevant section.
- Retrieve the full, intact section (not a fragment) directly from the binary.
No external calls, no database. The LLM can match the question to the right section and generate an answer based on authoritative content.
End‑to‑End Flow
When a customer asks “How do I create a user in ABC product?” the interaction looks like this:
Customer Claude MCP Server Docs (binary)
| | | |
| "How do I | | |
| create a user?"| | |
|---------------->| | |
| | get_index() | |
| |------------------>| |
| | |-- read index ---->|
| | | |
| | |-- read section -->|
| | | Users |
|<----------------| | |At no point does Claude guess or search the web. Every step is grounded in the documentation compiled into the MCP server.
Cost Comparison
| Aspect | RAG | Bundled Docs in MCP |
|---|---|---|
| Infrastructure | Vector DB + embedding model | None |
| Ongoing cost | Hosting + API calls | Zero |
| Customer setup | Build retrieval pipeline | Connect the server once |
| Retrieved content | Fragments, chunk boundaries | Whole sections, intact |
| Accuracy on domain‑specific queries | Depends on chunking strategy | High — intent‑matched index |
Deployment
Package the binary as an MCPB file—a bundle that includes the server and a manifest—and ship it with your product release. The customer imports it into their chat client once. The server runs in STDIO mode as a local subprocess alongside the chat client. No cloud, no open ports, no configuration.
Takeaway
If your customers are using AI chat clients (and they are), they are already asking questions about your product. The choice is whether those answers come from your documentation or from whatever the web surfaces.
Bundling your docs into an MCP server makes the right answer the default answer, with no setup required from the customer and no ongoing infrastructure cost on your end.