I Contributed to Chroma's Open-Source Docs — Here's What I Changed and What I Learned

Published: (June 10, 2026 at 01:50 PM EDT)
4 min read
Source: Dev.to

Source: Dev.to

Chroma is one of the most widely used open-source vector databases in the AI ecosystem. It’s the backbone of countless RAG pipelines, semantic search systems, and AI agent memory layers — including projects I’ve built myself. So when I found issue #3111 on their GitHub — a request to update their Haystack integration documentation — I decided to take it on. The issue was straightforward: the existing docs were outdated, lacked clear installation steps, and had almost no practical examples for ChromaDocumentStore. But when I looked closer, I found two already-open PRs. That’s where it got interesting. Before writing anything, I checked what others had already done. There were two open PRs for this exact issue: PR #3112: Deleted the entire documentation file. The contributor decided the best way to “update” the docs was to remove them entirely. This doesn’t help users — it just creates a dead link.

PR #7229 (manikanta7cheruku): Fixed a Path bug in the code example and swapped the LLM from HuggingFace (Mixtral) to OpenAI (GPT-3.5-turbo). The Path fix was good. The LLM swap was questionable — it replaced a free, self-hosted model with one that requires a paid API key, which is a worse default for open-source documentation.

Neither PR actually addressed the core request: comprehensive, clear examples for ChromaDocumentStore. So I wrote my own. I rewrote docs/mintlify/integrations/frameworks/haystack.mdx from 80 lines to 187 lines. Here’s what was added: The original docs only had pip install chroma-haystack. I added haystack-ai as an explicit dependency since the integration package doesn’t always pull it in correctly. A minimal 10-line example showing Document creation, writing, and counting — the fastest way to verify your setup works: from haystack import Document from haystack_integrations.document_stores.chroma import ChromaDocumentStore

document_store = ChromaDocumentStore()

docs = [ Document(content=“Chroma stores documents as vectors.”), Document(content=“Haystack provides the orchestration layer.”), ] document_store.write_documents(docs) print(document_store.count_documents()) # Output: 2

The original docs only showed in-memory mode. I documented all three: in-memory (development), persistent storage (disk-backed), and remote client-server (production with Docker or chroma run). The original code had “data” / Path(name) — a string divided by a Path object, which throws a TypeError. Fixed to Path(“data”) / name. Added three retrieval patterns: text query (auto-embedded by Chroma), embedding query (pre-computed vectors), and metadata filtering using Haystack’s filter syntax. A complete end-to-end RAG example connecting ChromaDocumentStore → ChromaQueryRetriever → PromptBuilder → HuggingFaceTGIGenerator. I kept HuggingFace (free, self-hosted) instead of swapping to OpenAI. Common issues: import errors, empty results, remote connection refused, HuggingFace token errors. Each with a one-line fix. Two people had already worked on this issue before me. One deleted the file. One did minimal fixes. If I hadn’t checked, I might have duplicated effort or missed context. Always review what’s already in flight. The old docs used from chroma_haystack import ChromaDocumentStore — that’s the archived package. The current one is from haystack_integrations.document_stores.chroma import ChromaDocumentStore. Same class, different package. This is exactly the kind of thing that makes docs go stale. I tried git push origin my-branch and got a 403. You have to fork the repo first, push to your fork, then create the PR with gh pr create —repo chroma-core/chroma —head SaintChris:my-branch. Simple, but not obvious if you haven’t contributed to a large project before. A broken code example in documentation is the same as a broken function — it wastes hours of developer time. The Path bug in the original docs would have caused a TypeError for anyone who copied the example. Fixing that one line probably saves more developer hours than most code contributions. PR #7230 is now open on the chroma-core/chroma repository. It’s documentation-only — no code changes. The branch is at SaintChris/chroma. Whether it gets merged or not, the process itself was the value. I now understand how Chroma’s deployment modes work at a deeper level, I’ve read their documentation pipeline, and I have a contribution on one of the most widely-used vector databases in the AI ecosystem. Open-source documentation is infrastructure. When it’s wrong, every developer who copies that example hits a wall. When it’s incomplete, they spend hours reading source code to figure out what the docs should have said. Contributing to docs isn’t glamorous. But for a project like Chroma — which thousands of developers depend on for RAG pipelines, agent memory, and semantic search — a clear, comprehensive integration guide is worth more than most feature PRs.

0 views
Back to Blog

Related posts

Read more »