The Problem: My AWS Q Business Bot Didn’t Understand My Data
Source: Dev.to
Why Metadata Matters in Q Business
Unlike a typical RAG system where you manually control embeddings, chunking, and retrieval, AWS Q Business handles all of this automatically. But “automatic” doesn’t mean “perfect”.
Without metadata, Q struggles with:
- Prioritizing fresh vs. old content
- Understanding document categories
- Scoping answers to specific teams or contexts
- Navigating Confluence pages with nested hierarchy
- Handling versioned documents
- Distinguishing source‑of‑truth vs. duplicates
Most importantly, Q can retrieve irrelevant content that “looks similar” but isn’t actually correct. Metadata fixes that.
1. Clean Inputs: Well‑Structured Data Sources
Each data source needs:
- A clear folder/project hierarchy
- Document titles that convey meaning
- Removal of outdated versions
- Explicit version numbers when needed
- Logical grouping (S3 prefixes / Confluence spaces)
Example restructuring in S3
s3://company-knowledge-base/
engineering/
architecture/
system-overview-v1.pdf
service-boundaries-v2.md
apis/
public-api-spec-v3.yaml
rate-limiting-rules-v1.pdf
deployment/
deployment-checklist-v3.md
rollback-runbook-v2.md
troubleshooting/
common-errors/
error-catalog-v2.json
service-x-known-issues.md
product/
specs/
feature-a-spec-v1.pdf
feature-b-updates-v2.pdf
roadmaps/
q4-2025-roadmap.pdf
operations/
monitoring/
alert-guide-v2.md
oncall-playbook-v1.md
logs/
access-logs-structure.json
application-log-fields.md
knowledge/
faq/
internal-faq-v1.md
glossary/
terms-v2.md
This alone improved retrieval accuracy by ~30%.
2. Metadata: The Secret to Making Q Business “Smart”
Q Business respects several metadata keys during retrieval.
Recommended Metadata Keys
| Key | Purpose |
|---|---|
title | Overrides filename during ranking |
category | Helps classification (e.g., “engg.”, “ops”) |
tags | Multiple labels improve semantic grouping |
version | Helps avoid outdated responses |
updated_at | Influences recency scoring |
department | Enables permission‑based personalization |
summary | Used in ranking + reranking |
source-of-truth | Boolean; strong influence on answer selection |
Example metadata attached to an S3 object
{
"title": "ABC Execution Workflow",
"category": "operations",
"tags": ["abc", "execution", "workflow", "ops"],
"version": "3.0",
"updated_at": "2025-10-10",
"source-of-truth": true,
"department": "engineering",
"summary": "Detailed ABC Process execution workflow."
}
This made Q consistently pick the correct ABC document every time.
3. Indexing Controls: Chunking, Schema & Access
AWS Q Business implicitly chunks content based on structure, but you can influence it:
- Ensure documents have headings (
h1,h2,h3), bullet points, numbered sections, and clear paragraphs. - Avoid huge dense text, poorly formatted PDFs, and scanned pages without OCR.
Provide a Schema for Structured Data
{
"type": "object",
"properties": {
"step_name": { "type": "string" },
"description": { "type": "string" },
"owner": { "type": "string" },
"timestamp": { "type": "string" }
}
}
This is especially useful for logs or other structured data sources.
My Final Setup That Worked Amazingly Well
- S3 with Clean Structure – Organized by domains → modules → versions.
- Confluence with Proper Page Hierarchy – Q understands “parent → child → sub‑page” when the hierarchy is clean.
- Role‑Based Access – Users receive personalized answers based on IAM roles.
- Scheduled Re‑indexing – Runs after every source update.
- Content Freshness / Sync – Sync strategy aligned with the content update process.
Metadata on Every Document
titletagscategoryversionupdated_atsummary
What I Learned
- Q isn’t truly “no configuration needed”; smart metadata is everything.
- Hierarchy and structure matter more than sheer quantity.
- Recency metadata avoids hallucinating old content.
source-of-truth: trueis extremely powerful.- Q Business is excellent, but only if your inputs are clean.
Conclusion
I initially thought AWS Q Business wasn’t retrieving the right data. Turns out I wasn’t feeding it the right structure. Once I fixed the data sources and metadata:
- Retrieval accuracy improved drastically
- Domain‑specific answers became sharp
- Version conflicts vanished
- Hallucinations dropped significantly
If you’re using AWS Q Business for enterprise search or internal assistants, your metadata and indexing strategies determine the quality of the AI.