Building a Safe AI Database Assistant with Azure OpenAI, LangChain & Function Calling
Source: Dev.to
From raw CSVs to a production‑ready AI assistant that queries data safely — without hallucinating SQL.
In this post, I’ll walk through how I built an AI‑powered data analyst using:
- Azure OpenAI
- LangChain
- LangGraph
- Function Calling
- SQLite
The assistant can:
- Analyze CSV data using pandas
- Query a SQL database safely
- Choose predefined backend functions automatically
- Explain results clearly
- Avoid hallucinations and unsafe SQL
🧩 Problem Statement
When working with AI models and databases, common problems include:
- ❌ Hallucinated SQL queries
- ❌ Unsafe
evalor raw SQL execution - ❌ No control over what the model can access
- ❌ No explanation of how results were computed
Goal: Build an AI assistant that:
- Answers analytical questions about COVID data
- Uses only allowed tools
- Never guesses
- Explains every answer
📊 Dataset
We use the COVID all‑states history dataset, which includes:
statedatehospitalizedIncreasepositiveIncrease- …and more
The dataset is first used as:
- A pandas DataFrame
- A SQLite database
🧱 Architecture Overview
User Question
↓
Azure OpenAI (Assistant / LangChain)
↓
Tool Selection (Function / SQL / DataFrame)
↓
Safe Backend Execution
↓
Result
↓
Final Explanation
Key idea:
- The model decides WHAT to do.
- Your backend decides HOW it is done.
🔹 Part 1: Talking to Azure OpenAI via LangChain
Connect to Azure OpenAI using AzureChatOpenAI:
llm = AzureChatOpenAI(
azure_endpoint="https://.cognitiveservices.azure.com/",
api_key="YOUR_API_KEY",
api_version="2024-12-01-preview",
model="gpt-4o-mini"
)
Sanity check
response = llm.invoke([
HumanMessage(content="Hello, Azure OpenAI via LangChain!")
])
print(response.content)
🔹 Part 2: DataFrame Agent (CSV Analysis)
Load the CSV into pandas and expose controlled computation via a tool.
DataFrame Tool
@tool
def run_df(query: str) -> str:
"""Run Python code on the global dataframe `df` and return the result."""
return str(eval(query))
⚠️ Note: In production, replace
evalwith a restricted execution layer.
Enforcing Tool Usage
llm_with_tools = llm.bind_tools([run_df])
The prompt forces the model to:
- Use the tool
- Perform actual pandas calculations
- Explain results
🔹 Part 3: Moving from CSV → SQL (SQLite)
Convert the CSV into SQLite:
engine = create_engine("sqlite:///./db/test.db")
df.to_sql(
name="all_states_history",
con=engine,
if_exists="replace",
index=False
)
Now the same dataset can be queried via SQL.
🔹 Part 4: SQL Agent with LangGraph
Create a ReAct agent using LangGraph:
agent_executor_SQL = create_react_agent(
model=llm,
tools=toolkit.get_tools()
)
The system prompt enforces:
- Only valid tables
- Only specific columns
- No hallucinated values
- Markdown‑only output
🔹 Part 5: Function Calling (No Raw SQL)
Instead of letting the model generate SQL, define pre‑approved backend functions.
Example Functions
def get_hospitalized_increase_for_state_on_date(state_abbr, specific_date):
...
def get_positive_cases_for_state_on_date(state_abbr, specific_date):
...
Function Registry (Critical!)
FUNCTION_MAP = {
"get_hospitalized_increase_for_state_on_date": get_hospitalized_increase_for_state_on_date,
"get_positive_cases_for_state_on_date": get_positive_cases_for_state_on_date,
}
This ensures:
- ✅ Only allowed functions run
- ❌ No arbitrary code execution
🔹 Part 6: Azure OpenAI Function Calling (No Assistant API)
Use Chat Completions + functions:
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=messages,
functions=functions,
function_call="auto"
)
If the model calls a function:
- Extract arguments
- Route via
FUNCTION_MAP - Execute backend logic
- Send result back
- Get final grounded answer
🔹 Part 7: Assistant API (Persistent Context)
Creating the Assistant
assistant = client.beta.assistants.create(
name="Covid Data Assistant",
model="gpt-4o-mini",
tools=[{"type": "function", "function": fn} for fn in functions]
)
Assistant Loop (Key Concept)
while True:
run_status = client.beta.threads.runs.retrieve(...)
if run_status.status == "requires_action":
# extract function name
# dispatch via FUNCTION_MAP
# submit tool output
elif run_status.status == "completed":
break
The assistant remembers conversation context, while still enforcing strict tool usage and safe execution.
🧠 Key Takeaways
- ✅ What This Design Solves
- Prevents SQL hallucinations
- Enforces backend safety
- Keeps AI answers grounded in data
- Scales cleanly as tools grow
🧩 Mental Model
| Layer | Responsibility |
|---|---|
| LLM | Reasoning & intent |
| Assistant | Tool selection |
| Backend | Data access |
| Function Map | Security |
🎯 When to Use What?
| Use Case | Best Choice |
|---|---|
| One‑shot queries | Chat + function calling |
| Multi‑turn analysis | Assistant API |
| CSV exploration | DataFrame tools |
| Production DB | Predefined SQL functions |
🚀 Final Thoughts
This approach mirrors how real production AI systems are built:
- AI decides what
- Backend controls how
Key benefits:
- Data remains authoritative
- Explanations remain transparent
Connect With Me
Let’s learn and build cool data‑science and AI projects together!
- LinkedIn:
- GitHub:
📩 Comments below are always welcome!
