Chat With Your Database for Free: Build a Private AI Agent in Python

Published: (December 27, 2025 at 08:54 AM EST)
7 min read
Source: Dev.to

Source: Dev.to

Cover image for Chat With Your Database for Free: Build a Private AI Agent in Python

Fabio Hiroki

You know the scenario. You are deep in flow state, solving a complex backend problem, when a Slack notification pops up from a Product Manager:

“Hey, can you quickly pull a list of all users who signed up in Germany last week? I need it for a meeting in 10 minutes.”

You know SQL. You could write that query in your sleep. But you shouldn’t have to.
Every minute you spend writing ad‑hoc reporting queries is a minute stolen from building actual product features.

The solution is not to build another complex admin dashboard.
The solution is to give your team a tool that speaks their language.

In this tutorial we will build a “Chat with your Database” application. It allows non‑technical users to ask questions in plain English while an AI writes and executes safe, read‑only SQL under the hood.

Best of all? It can run 100 % locally.

  • No API keys: $0.00 cost.
  • No data‑privacy risks: Your production schema never leaves your machine.
  • No “glue‑code” nightmares: We will use the new Model Context Protocol (MCP) to keep the architecture clean.

By the end of this guide you will have a deployable Docker container that turns “Hey, can you pull this data?” into “Sure, just ask the bot.”

Let’s reclaim your time.

The actual tool we are building today. No cloud, no API keys, running locally on a MacBook.

The Blueprint

To keep this tutorial digestible I will focus on the core logic of the AI integration. I won’t bore you with 50 lines of Python import statements.

You can grab the complete code (including Docker setup and seed‑data generator) in the repository. Clone it, run docker-compose up, and follow along.

The Architecture: “The USB‑C for AI”

Before we write a single line of code, we need to understand why we are using the Model Context Protocol (MCP).

In the early days of LLMs, connecting a model to a database meant writing messy “glue code.” You had to manually inject table definitions into prompts, parse regex to find SQL, and handle every error edge‑case yourself. It was brittle and hard to maintain.

MCP solves this by standardising the connection. Think of it like a USB‑C port for AI:

  • Resources (Context): The AI asks, “What data do I have?” Your server replies with the database schema.
  • Tools (Action): The AI decides, “I need to run a query.” Your server executes the function safely.

We are decoupling the Brain (the LLM) from the Hands (the database execution).

Stack we will build

ComponentTechnology
BrainMistral 7B running on Ollama (chosen for its reliable JSON output)
InterfaceStreamlit app where the user chats
ServerPython script using FastMCP that holds the SQLite connection

A chatbot receiving data questions

Step 1: The Environment (Docker is Your Friend)

We will use Docker to avoid the “it works on my machine” syndrome. The container holds our UI and our database, but can still talk to the Ollama instance running on your host.

Create a docker-compose.yml file. This is your one‑click setup:

services:
  mcp-reporter:
    build: .
    container_name: pocket_analyst
    ports:
      - "8501:8501"
    volumes:
      - ./data:/app/data
    environment:
      # Crucial: lets the container talk to Ollama on your laptop
      - OLLAMA_HOST=http://host.docker.internal:11434
    extra_hosts:
      - "host.docker.internal:host-gateway"

Step 2: The Server (Giving the AI “Hands”)

We use the FastMCP library to create a server. This script does two things:

  1. Exposes the database schema (Context).
  2. Provides a function to query the database (Tool).

Note: Boiler‑plate code is omitted for brevity. See src/server.py in the repo for the full implementation.

# ... standard imports and DB setup (see repo) ...

@mcp.resource("sqlite://schema")
def get_schema() -> str:
    """Read the database schema and return it as context."""
    # In a real app you would query sqlite_master dynamically
    return "CREATE TABLE transactions (...);"

@mcp.tool()
def query_database(sql: str) -> str:
    """
    Execute a read‑only SQL query against the database.

    Args:
        sql: The SELECT statement to execute.

    Returns:
        The query results as a JSON‑serialisable string.
    """
    # Connect to the database, execute the query, and return results
    # (implementation omitted for brevity)
    pass

Step 3: The AI Brain (Mistral 7B via Ollama)

Run Ollama locally and pull the Mistral 7B model:

ollama serve          # starts the Ollama server (default port 11434)
ollama pull mistral   # downloads the model

In the FastMCP server we configure a client that talks to this endpoint and instructs the model to:

  1. Request the schema via the get_schema resource.
  2. Generate a JSON‑only response containing the SQL query.
  3. Call query_database with the generated SQL.

The model’s response is then forwarded back to the Streamlit UI.

Step 4: The Front‑End (Streamlit Chat)

A minimal Streamlit app (app.py) handles the chat UI:

import streamlit as st
import httpx

st.title("🗂️ Chat with your Database")

if "messages" not in st.session_state:
    st.session_state.messages = []

def send_message(user_msg: str):
    st.session_state.messages.append({"role": "user", "content": user_msg})
    with httpx.Client(base_url="http://localhost:8000") as client:
        resp = client.post("/chat", json={"message": user_msg})
        assistant_msg = resp.json()["reply"]
    st.session_state.messages.append({"role": "assistant", "content": assistant_msg})

for msg in st.session_state.messages:
    st.chat_message(msg["role"]).write(msg["content"])

if prompt := st.chat_input("Ask a question about the data…"):
    send_message(prompt)

Run the UI with:

streamlit run app.py

Visit http://localhost:8501 to start chatting.

Step 5: Putting It All Together

# 1️⃣ Build and start the containers
docker-compose up --build

# 2️⃣ In another terminal, start Ollama (if not already running)
ollama serve

# 3️⃣ Open the Streamlit UI in your browser
#    → http://localhost:8501

You should now be able to ask natural‑language questions like:

  • “How many users signed up in Germany last week?”
  • “Show me the total transaction amount per country for the past month.”

The AI will:

  1. Retrieve the schema.
  2. Generate safe, read‑only SQL.
  3. Execute the query against the SQLite DB.
  4. Return the results to the chat.

Recap

  • Zero API keys – everything runs locally.
  • Zero data leakage – your schema never leaves your machine.
  • Clean architecture – MCP cleanly separates context, tools, and the LLM brain.
  • Dockerised – one command to spin up the whole stack.

Give it a spin, tweak the prompts, and adapt the pattern to your own databases (Postgres, MySQL, etc.). Happy hacking!

Step 3: The Brain (Why Mistral Wins)

Now we need the Host – the Streamlit app that manages the conversation.

We are using Ollama to run the model. A critical insight from testing: use Mistral 7B, not Llama 3.

While Llama 3 is powerful, smaller models often struggle with strict JSON formatting. You ask for

{ "sql": "…" }

and Llama might reply with “Sure! Here is your JSON: { … }”. That extra conversational text breaks the parser.

Mistral 7B is remarkably disciplined. It shuts up and outputs the JSON we need to trigger our tool.

src/app.py (excerpt)

# ... imports and Streamlit page config omitted ...

# The System Prompt is where we "program" the behavior
# We explicitly tell the model about the schema and the output format
SYSTEM_PROMPT = """
You are a Data Analyst. 
1. Output ONLY a JSON object with the tool call.
2. The format must be: { "tool": "query_database", "sql": "SELECT..." }
3. Always convert cents to main currency units in the final answer.
"""

# ... inside the main chat loop ...

if prompt := st.chat_input("Ask a question..."):
    # 1. Send the user's request to Ollama
    # Note: We use format='json' to force structured output
    response = client.chat(
        model="mistral",
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user",   "content": prompt}
        ],
        format="json"
    )

    # 2. Parse the tool call
    # (Full error handling and JSON parsing logic is in the repo)
    tool_call = json.loads(response['message']['content'])

    if tool_call['tool'] == 'query_database':
        # 3. Execute the SQL via our MCP Server
        result = call_mcp_tool(tool_call['sql'])

        # 4. Feed the data back to Mistral to summarize
        final_answer = client.chat(
            model="mistral",
            messages=[
                # … previous messages …
                {"role": "user", "content": f"Data: {result}"}
            ]
        )
        st.write(final_answer['message']['content'])

Python and Mistral logos in a computer screen

Step 4: Talking to Your Data

After running docker-compose up, the container automatically executes a seed_data.py script (included in the repo) that fills a SQLite database with 2,000 realistic “Mock Stripe” transactions.

Test 1 – The “Complex Filter”

Prompt

How many failed credit card payments did we receive from Germany yesterday?

Generated SQL

SELECT count(*)
FROM transactions
WHERE country_code = 'DE'
  AND status = 'failed'
  AND payment_method = 'card'
  AND created_at > date('now', '-1 day');

Test 2 – The “Ad‑Hoc Transformation”

Prompt

Calculate the total EUR payments per country in the last month, format in CSV.

Result (CSV)

"Country Code","Total EUR Payments"
"US",2873932
"DE",1093878
"GB",533471
"BR",492770
"FR",454523
"JP",348649

The fatal flaw of traditional dashboards is that they only answer the questions you predicted users would ask. This architecture solves the problems you never saw coming, replacing a finite set of hard‑coded buttons with the infinite flexibility of language.

Let’s keep growing.
Subscribe to level up your engineering career with confidence.

Back to Blog

Related posts

Read more »