Chat With Your Database for Free: Build a Private AI Agent in Python
Source: Dev.to

You know the scenario. You are deep in flow state, solving a complex backend problem, when a Slack notification pops up from a Product Manager:
“Hey, can you quickly pull a list of all users who signed up in Germany last week? I need it for a meeting in 10 minutes.”
You know SQL. You could write that query in your sleep. But you shouldn’t have to.
Every minute you spend writing ad‑hoc reporting queries is a minute stolen from building actual product features.
The solution is not to build another complex admin dashboard.
The solution is to give your team a tool that speaks their language.
In this tutorial we will build a “Chat with your Database” application. It allows non‑technical users to ask questions in plain English while an AI writes and executes safe, read‑only SQL under the hood.
Best of all? It can run 100 % locally.
- No API keys: $0.00 cost.
- No data‑privacy risks: Your production schema never leaves your machine.
- No “glue‑code” nightmares: We will use the new Model Context Protocol (MCP) to keep the architecture clean.
By the end of this guide you will have a deployable Docker container that turns “Hey, can you pull this data?” into “Sure, just ask the bot.”
Let’s reclaim your time.

The Blueprint
To keep this tutorial digestible I will focus on the core logic of the AI integration. I won’t bore you with 50 lines of Python import statements.
You can grab the complete code (including Docker setup and seed‑data generator) in the repository. Clone it, run docker-compose up, and follow along.
The Architecture: “The USB‑C for AI”
Before we write a single line of code, we need to understand why we are using the Model Context Protocol (MCP).
In the early days of LLMs, connecting a model to a database meant writing messy “glue code.” You had to manually inject table definitions into prompts, parse regex to find SQL, and handle every error edge‑case yourself. It was brittle and hard to maintain.
MCP solves this by standardising the connection. Think of it like a USB‑C port for AI:
- Resources (Context): The AI asks, “What data do I have?” Your server replies with the database schema.
- Tools (Action): The AI decides, “I need to run a query.” Your server executes the function safely.
We are decoupling the Brain (the LLM) from the Hands (the database execution).
Stack we will build
| Component | Technology |
|---|---|
| Brain | Mistral 7B running on Ollama (chosen for its reliable JSON output) |
| Interface | Streamlit app where the user chats |
| Server | Python script using FastMCP that holds the SQLite connection |

Step 1: The Environment (Docker is Your Friend)
We will use Docker to avoid the “it works on my machine” syndrome. The container holds our UI and our database, but can still talk to the Ollama instance running on your host.
Create a docker-compose.yml file. This is your one‑click setup:
services:
mcp-reporter:
build: .
container_name: pocket_analyst
ports:
- "8501:8501"
volumes:
- ./data:/app/data
environment:
# Crucial: lets the container talk to Ollama on your laptop
- OLLAMA_HOST=http://host.docker.internal:11434
extra_hosts:
- "host.docker.internal:host-gateway"
Step 2: The Server (Giving the AI “Hands”)
We use the FastMCP library to create a server. This script does two things:
- Exposes the database schema (Context).
- Provides a function to query the database (Tool).
Note: Boiler‑plate code is omitted for brevity. See src/server.py in the repo for the full implementation.
# ... standard imports and DB setup (see repo) ...
@mcp.resource("sqlite://schema")
def get_schema() -> str:
"""Read the database schema and return it as context."""
# In a real app you would query sqlite_master dynamically
return "CREATE TABLE transactions (...);"
@mcp.tool()
def query_database(sql: str) -> str:
"""
Execute a read‑only SQL query against the database.
Args:
sql: The SELECT statement to execute.
Returns:
The query results as a JSON‑serialisable string.
"""
# Connect to the database, execute the query, and return results
# (implementation omitted for brevity)
pass
Step 3: The AI Brain (Mistral 7B via Ollama)
Run Ollama locally and pull the Mistral 7B model:
ollama serve # starts the Ollama server (default port 11434)
ollama pull mistral # downloads the model
In the FastMCP server we configure a client that talks to this endpoint and instructs the model to:
- Request the schema via the
get_schemaresource. - Generate a JSON‑only response containing the SQL query.
- Call
query_databasewith the generated SQL.
The model’s response is then forwarded back to the Streamlit UI.
Step 4: The Front‑End (Streamlit Chat)
A minimal Streamlit app (app.py) handles the chat UI:
import streamlit as st
import httpx
st.title("🗂️ Chat with your Database")
if "messages" not in st.session_state:
st.session_state.messages = []
def send_message(user_msg: str):
st.session_state.messages.append({"role": "user", "content": user_msg})
with httpx.Client(base_url="http://localhost:8000") as client:
resp = client.post("/chat", json={"message": user_msg})
assistant_msg = resp.json()["reply"]
st.session_state.messages.append({"role": "assistant", "content": assistant_msg})
for msg in st.session_state.messages:
st.chat_message(msg["role"]).write(msg["content"])
if prompt := st.chat_input("Ask a question about the data…"):
send_message(prompt)
Run the UI with:
streamlit run app.py
Visit http://localhost:8501 to start chatting.
Step 5: Putting It All Together
# 1️⃣ Build and start the containers
docker-compose up --build
# 2️⃣ In another terminal, start Ollama (if not already running)
ollama serve
# 3️⃣ Open the Streamlit UI in your browser
# → http://localhost:8501
You should now be able to ask natural‑language questions like:
- “How many users signed up in Germany last week?”
- “Show me the total transaction amount per country for the past month.”
The AI will:
- Retrieve the schema.
- Generate safe, read‑only SQL.
- Execute the query against the SQLite DB.
- Return the results to the chat.
Recap
- Zero API keys – everything runs locally.
- Zero data leakage – your schema never leaves your machine.
- Clean architecture – MCP cleanly separates context, tools, and the LLM brain.
- Dockerised – one command to spin up the whole stack.
Give it a spin, tweak the prompts, and adapt the pattern to your own databases (Postgres, MySQL, etc.). Happy hacking!
Step 3: The Brain (Why Mistral Wins)
Now we need the Host – the Streamlit app that manages the conversation.
We are using Ollama to run the model. A critical insight from testing: use Mistral 7B, not Llama 3.
While Llama 3 is powerful, smaller models often struggle with strict JSON formatting. You ask for
{ "sql": "…" }
and Llama might reply with “Sure! Here is your JSON: { … }”. That extra conversational text breaks the parser.
Mistral 7B is remarkably disciplined. It shuts up and outputs the JSON we need to trigger our tool.
src/app.py (excerpt)
# ... imports and Streamlit page config omitted ...
# The System Prompt is where we "program" the behavior
# We explicitly tell the model about the schema and the output format
SYSTEM_PROMPT = """
You are a Data Analyst.
1. Output ONLY a JSON object with the tool call.
2. The format must be: { "tool": "query_database", "sql": "SELECT..." }
3. Always convert cents to main currency units in the final answer.
"""
# ... inside the main chat loop ...
if prompt := st.chat_input("Ask a question..."):
# 1. Send the user's request to Ollama
# Note: We use format='json' to force structured output
response = client.chat(
model="mistral",
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": prompt}
],
format="json"
)
# 2. Parse the tool call
# (Full error handling and JSON parsing logic is in the repo)
tool_call = json.loads(response['message']['content'])
if tool_call['tool'] == 'query_database':
# 3. Execute the SQL via our MCP Server
result = call_mcp_tool(tool_call['sql'])
# 4. Feed the data back to Mistral to summarize
final_answer = client.chat(
model="mistral",
messages=[
# … previous messages …
{"role": "user", "content": f"Data: {result}"}
]
)
st.write(final_answer['message']['content'])

Step 4: Talking to Your Data
After running docker-compose up, the container automatically executes a seed_data.py script (included in the repo) that fills a SQLite database with 2,000 realistic “Mock Stripe” transactions.
Test 1 – The “Complex Filter”
Prompt
How many failed credit card payments did we receive from Germany yesterday?
Generated SQL
SELECT count(*)
FROM transactions
WHERE country_code = 'DE'
AND status = 'failed'
AND payment_method = 'card'
AND created_at > date('now', '-1 day');
Test 2 – The “Ad‑Hoc Transformation”
Prompt
Calculate the total EUR payments per country in the last month, format in CSV.
Result (CSV)
"Country Code","Total EUR Payments"
"US",2873932
"DE",1093878
"GB",533471
"BR",492770
"FR",454523
"JP",348649
The fatal flaw of traditional dashboards is that they only answer the questions you predicted users would ask. This architecture solves the problems you never saw coming, replacing a finite set of hard‑coded buttons with the infinite flexibility of language.
Let’s keep growing.
Subscribe to level up your engineering career with confidence.
