Building a Safe AI Database Assistant with Azure OpenAI, LangChain & Function Calling

Published: (December 19, 2025 at 01:50 AM EST)
3 min read
Source: Dev.to

Source: Dev.to

Khushi Singla

From raw CSVs to a production‑ready AI assistant that queries data safely — without hallucinating SQL.

In this post, I’ll walk through how I built an AI‑powered data analyst using:

  • Azure OpenAI
  • LangChain
  • LangGraph
  • Function Calling
  • SQLite

The assistant can:

  • Analyze CSV data using pandas
  • Query a SQL database safely
  • Choose predefined backend functions automatically
  • Explain results clearly
  • Avoid hallucinations and unsafe SQL

🧩 Problem Statement

When working with AI models and databases, common problems include:

  • ❌ Hallucinated SQL queries
  • ❌ Unsafe eval or raw SQL execution
  • ❌ No control over what the model can access
  • ❌ No explanation of how results were computed

Goal: Build an AI assistant that:

  • Answers analytical questions about COVID data
  • Uses only allowed tools
  • Never guesses
  • Explains every answer

📊 Dataset

We use the COVID all‑states history dataset, which includes:

  • state
  • date
  • hospitalizedIncrease
  • positiveIncrease
  • …and more

The dataset is first used as:

  • A pandas DataFrame
  • A SQLite database

🧱 Architecture Overview

User Question

Azure OpenAI (Assistant / LangChain)

Tool Selection (Function / SQL / DataFrame)

Safe Backend Execution

Result

Final Explanation

Key idea:

  • The model decides WHAT to do.
  • Your backend decides HOW it is done.

🔹 Part 1: Talking to Azure OpenAI via LangChain

Connect to Azure OpenAI using AzureChatOpenAI:

llm = AzureChatOpenAI(
    azure_endpoint="https://.cognitiveservices.azure.com/",
    api_key="YOUR_API_KEY",
    api_version="2024-12-01-preview",
    model="gpt-4o-mini"
)

Sanity check

response = llm.invoke([
    HumanMessage(content="Hello, Azure OpenAI via LangChain!")
])
print(response.content)

🔹 Part 2: DataFrame Agent (CSV Analysis)

Load the CSV into pandas and expose controlled computation via a tool.

DataFrame Tool

@tool
def run_df(query: str) -> str:
    """Run Python code on the global dataframe `df` and return the result."""
    return str(eval(query))

⚠️ Note: In production, replace eval with a restricted execution layer.

Enforcing Tool Usage

llm_with_tools = llm.bind_tools([run_df])

The prompt forces the model to:

  • Use the tool
  • Perform actual pandas calculations
  • Explain results

🔹 Part 3: Moving from CSV → SQL (SQLite)

Convert the CSV into SQLite:

engine = create_engine("sqlite:///./db/test.db")

df.to_sql(
    name="all_states_history",
    con=engine,
    if_exists="replace",
    index=False
)

Now the same dataset can be queried via SQL.

🔹 Part 4: SQL Agent with LangGraph

Create a ReAct agent using LangGraph:

agent_executor_SQL = create_react_agent(
    model=llm,
    tools=toolkit.get_tools()
)

The system prompt enforces:

  • Only valid tables
  • Only specific columns
  • No hallucinated values
  • Markdown‑only output

🔹 Part 5: Function Calling (No Raw SQL)

Instead of letting the model generate SQL, define pre‑approved backend functions.

Example Functions

def get_hospitalized_increase_for_state_on_date(state_abbr, specific_date):
    ...
def get_positive_cases_for_state_on_date(state_abbr, specific_date):
    ...

Function Registry (Critical!)

FUNCTION_MAP = {
    "get_hospitalized_increase_for_state_on_date": get_hospitalized_increase_for_state_on_date,
    "get_positive_cases_for_state_on_date": get_positive_cases_for_state_on_date,
}

This ensures:

  • ✅ Only allowed functions run
  • ❌ No arbitrary code execution

🔹 Part 6: Azure OpenAI Function Calling (No Assistant API)

Use Chat Completions + functions:

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=messages,
    functions=functions,
    function_call="auto"
)

If the model calls a function:

  1. Extract arguments
  2. Route via FUNCTION_MAP
  3. Execute backend logic
  4. Send result back
  5. Get final grounded answer

🔹 Part 7: Assistant API (Persistent Context)

Creating the Assistant

assistant = client.beta.assistants.create(
    name="Covid Data Assistant",
    model="gpt-4o-mini",
    tools=[{"type": "function", "function": fn} for fn in functions]
)

Assistant Loop (Key Concept)

while True:
    run_status = client.beta.threads.runs.retrieve(...)

    if run_status.status == "requires_action":
        # extract function name
        # dispatch via FUNCTION_MAP
        # submit tool output
    elif run_status.status == "completed":
        break

The assistant remembers conversation context, while still enforcing strict tool usage and safe execution.

🧠 Key Takeaways

  • What This Design Solves
    • Prevents SQL hallucinations
    • Enforces backend safety
    • Keeps AI answers grounded in data
    • Scales cleanly as tools grow

🧩 Mental Model

LayerResponsibility
LLMReasoning & intent
AssistantTool selection
BackendData access
Function MapSecurity

🎯 When to Use What?

Use CaseBest Choice
One‑shot queriesChat + function calling
Multi‑turn analysisAssistant API
CSV explorationDataFrame tools
Production DBPredefined SQL functions

🚀 Final Thoughts

This approach mirrors how real production AI systems are built:

  • AI decides what
  • Backend controls how

Key benefits:

  • Data remains authoritative
  • Explanations remain transparent

Connect With Me

Let’s learn and build cool data‑science and AI projects together!

  • LinkedIn:
  • GitHub:

📩 Comments below are always welcome!

Back to Blog

Related posts

Read more »