Creating a AI-enabled Slackbot with AWS Bedrock Knowledge Base
Source: Dev.to
One of the lowest‑friction, highest‑ROI applications of large language models (LLMs) so far has been the internal AI assistant
Yes, AI doesn’t have to be all about customer‑facing chatbots or fully autonomous agents. A simple interface that lets users ask questions like the following can be a powerful tool:
- “How do I deploy this service?”
- “What’s the on‑call runbook for this alert?”
- “Where is the latest diagram for the design doc?”
These questions already have answers — scattered across Confluence pages, Google Docs, GitHub READMEs, and Slack threads. The problem isn’t generation. It’s retrieval.
Out of the box, LLMs are great at reasoning and summarisation, but they’re completely disconnected from your organisation’s institutional knowledge. Prompt stuffing helps a bit. Fine‑tuning helps in very narrow cases. Neither scales when your knowledge base changes weekly, or when correctness actually matters.
This is the void that Retrieval‑Augmented Generation (RAG) fills
RAG bridges the gap between probabilistic language models and deterministic internal knowledge. Instead of asking an LLM to guess, you:
- Retrieve relevant documents first.
- Ask the model to synthesise an answer grounded in that context.
The result is an assistant that feels intelligent without being reckless — and, crucially, one that stays up‑to‑date without constant retraining.
If you’re already on AWS, Amazon Bedrock Knowledge Bases provides an easy way to create, deploy, and integrate a RAG into your existing infrastructure. In this post we’ll walk through how to use an AWS Bedrock Knowledge Base and connect it to a Slack bot for a realistic internal, AI‑enabled assistant use case.

Setting up AWS Bedrock Knowledge Base
-
Open the AWS console → Amazon Bedrock → Build → Knowledge Bases.
-
As of the time of writing, Bedrock supports indexing unstructured data via:
- a custom vector store,
- the Kendra GenAI service, or
- semantic search with structured data (e.g., databases, tables).
Since most internal data is unstructured (Confluence docs, markdown files, etc.), choose “Create knowledge base with vector store.”
-
Data‑source options (currently limited to 5): Confluence, Salesforce, SharePoint, Web Crawlers on S3, etc.
For this demo we’ll use Confluence.
Store the Confluence credentials in AWS Secrets Manager as described in the official guide. -
Configure parsing & chunking
- Choose either the AWS default parser or a foundation model (e.g., Claude) as a parser.
- Select a chunking strategy for the vector database.
Bedrock will automatically chunk documents, generate embeddings, and store vectors in OpenSearch Serverless.
For a quick demo, keep the defaults and use Amazon Titan embeddings.
-
Sync the data source
After the vector store is created, manually trigger a sync.
You can later add other sources (SharePoint PDFs, open‑source library docs, internal S3 files, etc.).
Setting up a Slack bot
- Create a Slack App in the Slack Admin Console.
- Enable Socket Mode.
- Add the minimal OAuth scopes:
chat:writeapp_mentions:readchannels:history
- Grab the Bot Token from the Basic Information page.
Bot implementation (Python + Slack Bolt)
We’ll use the Slack Bolt SDK to spin up a bot that:
- Parses Slack events (mentions, slash commands, etc.)
- Queries the Bedrock Knowledge Base
- Generates a response

Pseudocode
def handler(event, context):
# 1️⃣ Extract the user message from the Slack event
text = extract_slack_message(event)
# 2️⃣ Retrieve relevant chunks from the Knowledge Base
retrieval = bedrock.retrieve(
knowledgeBaseId=KB_ID,
query=text,
retrievalConfiguration={
"vectorSearchConfiguration": {"numberOfResults": 5}
},
)
# 3️⃣ Build the prompt that combines the user query with retrieved context
prompt = build_prompt(
user_question=text,
retrieved_chunks=retrieval["results"]
)
# 4️⃣ Invoke the LLM (e.g., Claude Sonnet) via Bedrock Runtime
response = bedrock_runtime.invoke_model(
modelId="arn:aws:bedrock:us-east-1:...:inference-profile/us.anthropic.claude-sonnet-4-5-20250929-v1:0",
body=prompt,
)
# 5️⃣ Post the answer back to Slack
post_to_slack(response)
Tuning for performance
Because LLMs are non‑deterministic, we should guide them with a clear system prompt. While RAG supplies the “internal” knowledge, prompt engineering improves relevance and safety.
Prompt template
You are an internal engineering assistant.
Answer the question using ONLY the provided context.
If the answer is not in the context, say you do not know.
{{retrieved_chunks}}
Question: {{user_question}}
Tips
- Keep the context size within the model’s token limit (e.g., 4 KB for Claude Sonnet).
- Experiment with
temperature=0for more deterministic outputs. - Adjust
numberOfResultsand chunk size based on latency vs. relevance trade‑offs.
Recap
- Create a Bedrock Knowledge Base backed by a vector store (OpenSearch Serverless).
- Connect your internal data sources (Confluence, SharePoint, S3, etc.) and let Bedrock handle parsing, chunking, and embedding.
- Build a Slack bot that retrieves relevant chunks, feeds them to an LLM with a concise system prompt, and returns the answer to the user.
- Iterate on prompt and retrieval settings to optimise accuracy, latency, and cost.
Additional notes
- Enter fullscreen mode
- Exit fullscreen mode
The other dial we can turn is how we embed and store our internal knowledge. AWS has a great guide on how content chunking works for knowledge bases. The key takeaway is that, depending on how the data is structured, different chunking schemes will perform better. For example, lots of Confluence documentation has a natural hierarchical pattern with headings and body, so using hierarchical chunking can link information better and lead to improved retrieval performance.
Wrapping up
AI‑enabled Slackbots are quickly becoming the front door to internal knowledge. With Amazon Bedrock Knowledge Bases, AWS has made it easy to build a RAG without needing to operate and maintain a vector database for the most part.
With powerful LLMs like ChatGPT and Claude, creating a Slack bot is easier than ever. If you would like to compare your solution with a working model, there is a slightly outdated yet functional example from the AWS team on GitHub that you can follow.