Hello again, here's a LangChain Ollama helper sheet :)

Published: (February 9, 2026 at 10:19 AM EST)
10 min read
Source: Dev.to

Source: Dev.to

LangChain + Ollama: A Practical Guide to Building AI Agents with Python

This guide teaches you how to build real, working AI agents using Ollama and LangChain.

In this guide you’ll discover:

  • ✅ How to set up Ollama + LangChain (≈ 10 min)
  • ✅ When to use ollama.chat() vs. ChatOllama() (quick decision tree)
  • ✅ How to build agents that remember things (persistent storage)
  • ✅ Real, copy‑&‑paste‑ready examples
  • ✅ Performance‑tuning tips for your machine
  • ✅ How to deploy to production

Decision Flowchart

flowchart TD
    A[Want to use AI in your Python code?] --> B{Building a multi‑step AI agent that makes decisions and uses tools?}
    B -->|YES| C[Use **ChatOllama()**]
    B -->|NO|  D[Use **ollama.chat()**]

    C -->|✅ For agents|
    C -->|✅ For tools|
    C -->|✅ For state management|
    C -->|✅ For production|

    D -->|✅ For simple queries|
    D -->|✅ For streaming|
    D -->|✅ For speed|
    D -->|✅ For prototyping|

Performance Benchmarks

OperationTypical latencyNotes
ollama.chat() response15–25 msFastest
ChatOllama() response35–55 msMore features
Streaming first token5–20 msReal‑time feedback
Tool execution2–12 msOverhead varies

Tip: On a laptop with 8 GB RAM you’ll typically see responses under 100 ms. Cloud APIs usually add 500 ms+ of network latency.

ollama.chat() – Simple Queries & Streaming

When to use

You just need to ask the AI something and get an answer (or stream the answer).

Start Ollama

# Terminal 1: start the Ollama server
ollama serve

Basic request

import ollama

response = ollama.chat(
    model="qwen2.5-coder:latest",
    messages=[{"role": "user", "content": "What is 2 + 2?"}]
)

print(response["message"]["content"])
# → 2 + 2 equals 4

Streaming (AI “thinking” in real time)

import ollama

print("AI: ", end="", flush=True)

for chunk in ollama.chat(
    model="qwen2.5-coder:latest",
    messages=[{"role": "user", "content": "Write a haiku about code"}],
    stream=True,
):
    print(chunk["message"]["content"], end="", flush=True)

print()   # newline

Output

AI: Lines of logic dance,
Bugs and fixes both take turns—
Code shapes the future.

Conversational loop (context memory)

import ollama

messages = []

while True:
    user_input = input("You: ")
    if not user_input:
        break

    # add user message
    messages.append({"role": "user", "content": user_input})

    # get response
    response = ollama.chat(
        model="qwen2.5-coder:latest",
        messages=messages,
    )
    ai_reply = response["message"]["content"]
    print(f"\nAI: {ai_reply}\n")

    # add AI reply so the model remembers the context
    messages.append({"role": "assistant", "content": ai_reply})

Sample conversation

You: What is a lambda function in Python?
AI: A lambda function is a small anonymous function...

You: How is it different from a regular function?
AI: Great question! The key differences are...

The AI keeps track of the conversation because we retain the full messages list.

ChatOllama() – Agents, Tools, & State Management

When to use

You need a more sophisticated setup: agents that make decisions, call tools, and maintain state.

Install required packages

pip install langchain-ollama langchain langgraph

Minimal agent that tells the time

from langchain_ollama import ChatOllama
from langchain.tools import tool
from langchain.agents import create_agent

# 1️⃣ Define a tool
@tool
def get_current_time() -> str:
    """Return the current local time as HH:MM:SS."""
    from datetime import datetime
    return datetime.now().strftime("%H:%M:%S")

# 2️⃣ Create the LLM wrapper
llm = ChatOllama(
    model="qwen2.5-coder:latest",
    temperature=0.0,          # deterministic output
)

# 3️⃣ Build the agent
agent = create_agent(
    llm,
    tools=[get_current_time],
    system_prompt="You are a helpful time assistant."
)

# 4️⃣ Invoke it
result = agent.invoke({
    "messages": [{"role": "user", "content": "What time is it right now?"}]
})

print(result["output"])
# → It is currently 14:23:45

What just happened?

  1. You asked the agent for the time.
  2. The agent decided it needed the get_current_time tool.
  3. It called the tool, got the time, and replied politely.

The decision logic lives in the LLM; you only provide the tools.

Adding more tools (math example)

from langchain.tools import tool

@tool
def add_numbers(a: int, b: int) -> int:
    """Return a + b."""
    return a + b

@tool
def multiply_numbers(a: int, b: int) -> int:
    """Return a * b."""
    return a * b

# Re‑use the same LLM instance
agent = create_agent(
    llm,
    tools=[add_numbers, multiply_numbers, get_current_time],
    system_prompt="You are a helpful math assistant."
)

# Ask a math question
result = agent.invoke({
    "messages": [{"role": "user", "content": "What's 25 * 4?"}]
})

print(result["output"])
# → 25 * 4 equals 100

The agent automatically chose the multiply_numbers tool.

Persistent Memory (User Preferences, History)

from agent_workspace.hybrid_store import HybridStore
from langchain.tools import tool

# 1️⃣ Persistent storage
store = HybridStore(storage_dir="agent_workspace/storage")

# 2️⃣ Tools that interact with the store
@tool
def save_preference(key: str, value: str, runtime) -> str:
    """Persist a user preference."""
    runtime.store.put(("preferences",), key, {"value": value})
    return f"Saved: {key} = {value}"


@tool
def get_preference(key: str, runtime) -> str:
    """Retrieve a saved preference."""
    pref = runtime.store.get(("preferences",), key)
    if pref:
        return f"Your {key} is {pref['value']}"
    return f"No saved value for {key}"

You can now give the agent tools that read/write to a durable store, allowing it to remember user‑specific data across sessions.

Quick Recap

Featureollama.chat()ChatOllama()
Use caseSimple Q&A, streaming answersMulti‑step agents, tool use, state
Speed15‑25 ms35‑55 ms
ComplexityMinimal codeRequires LangChain + tool definitions
When to pickQuick prototyping, single‑turn queriesProduction‑grade agents, memory, tool integration

Happy building! 🚀

Persistent Preference Agent Example

Below is a minimal, self‑contained example that shows how to:

  1. Save a user preference with a custom tool.
  2. Retrieve the saved preference later – even after the program (or the computer) restarts.

The persistence is handled by HybridStore, which serialises the LangChain runtime store to disk.

# --------------------------------------------------------------
# Imports
# --------------------------------------------------------------
from langchain.tools import tool
from langchain.agents import create_agent
from langchain_ollama import ChatOllama
from agent_workspace.hybrid_store import HybridStore

# --------------------------------------------------------------
# Tools
# --------------------------------------------------------------
@tool
def save_preference(key: str, value: str, runtime) -> str:
    """
    Save a user preference.

    Parameters
    ----------
    key : str
        Identifier for the preference (e.g. "favorite_color").
    value : str
        The value to store.
    runtime : Runtime
        The LangChain runtime that provides access to the store.

    Returns
    -------
    str
        Confirmation message.
    """
    store = runtime.store
    # Store the value under the namespace ("preferences", key)
    store.put(("preferences", key), "value", {"value": value})
    return f"Preference saved: {key} = {value}"


@tool
def get_preference(key: str, runtime) -> str:
    """
    Retrieve a saved preference.

    Parameters
    ----------
    key : str
        Identifier for the preference to fetch.
    runtime : Runtime
        The LangChain runtime that provides access to the store.

    Returns
    -------
    str
        The stored value or a “not found” message.
    """
    store = runtime.store
    pref = store.get(("preferences", key), "value")
    if pref:
        return f"{key} is: {pref.value['value']}"
    return "No preference found"


# --------------------------------------------------------------
# Agent setup
# --------------------------------------------------------------
llm = ChatOllama(model="qwen2.5-coder:latest", temperature=0.0)
store = HybridStore()                     # Persistent storage backend

agent = create_agent(
    llm,
    tools=[save_preference, get_preference],
    store=store,                           # Connect the storage to the agent
    system_prompt="You help manage user preferences."
)


# --------------------------------------------------------------
# Sessions (demonstration)
# --------------------------------------------------------------
# Session 1 – Save a preference
print("=== Session 1 ===")
result1 = agent.invoke({
    "messages": [{"role": "user", "content": "Remember that my favorite color is blue"}]
})
print(result1["output"])

# Session 2 – Retrieve the preference (even after a restart!)
print("\n=== Session 2 (After Restart) ===")
result2 = agent.invoke({
    "messages": [{"role": "user", "content": "What's my favorite color?"}]
})
print(result2["output"])
# Expected output: "Your favorite color is: blue"

The magic:
HybridStore (from the MagicPython library) automatically writes the runtime store to a file.
Consequently, any data saved in Session 1 remains available in Session 2, regardless of whether the Python process—or the entire computer—has been restarted.

Simple Python‑Code Assistant

A minimal example that shows how to create a LangChain agent equipped with two tools:

  • check_python_syntax – validates Python code syntax.
  • explain_code – returns a placeholder explanation (replace with an LLM call in a real app).
from langchain.tools import tool
from langchain.agents import create_agent
from langchain_ollama import ChatOllama


@tool
def check_python_syntax(code: str) -> str:
    """Check if the supplied Python code is syntactically valid."""
    try:
        compile(code, "", "exec")
        return "✅ Syntax is valid!"
    except SyntaxError as e:
        return f"❌ Syntax error: {e}"


@tool
def explain_code(code: str) -> str:
    """Provide a simple explanation of what the code does."""
    # In a real application you would call an LLM here.
    return "This code does X, Y, and Z"


# Initialise the LLM (Ollama model) – temperature set to 0 for deterministic output.
llm = ChatOllama(model="qwen2.5-coder:latest", temperature=0.0)

# Build the agent with the two tools defined above.
agent = create_agent(
    llm,
    tools=[check_python_syntax, explain_code],
    system_prompt=(
        "You are a Python code assistant. Help the user write and understand code."
    ),
)


# ---- Usage ---------------------------------------------------------

code = """
def greet(name):
    print(f"Hello, {name}!")
"""

# Ask the agent to validate the snippet.
result = agent.invoke(
    {
        "messages": [
            {
                "role": "user",
                "content": f"Is this Python code valid?\n\n{code}",
            }
        ]
    }
)

print(result["output"])
# Example output: "✅ Syntax is valid!"

Data‑Analysis Agent with Persistent Reports

import json
from langchain.tools import tool
from langchain.agents import create_agent
from langchain_ollama import ChatOllama
from agent_workspace.hybrid_store import HybridStore

# Sample sales data
SALES_DATA = [
    {"product": "Laptop",     "sales": 15},
    {"product": "Phone",      "sales": 42},
    {"product": "Tablet",     "sales": 28},
    {"product": "Headphones", "sales": 35},
]

@tool
def get_sales_data() -> str:
    """Return the latest sales data as JSON."""
    return json.dumps(SALES_DATA)


@tool
def save_report(summary: str, runtime) -> str:
    """Save an analysis report."""
    store = runtime.store
    store.put(("reports",), "latest", {"summary": summary})
    return "Report saved!"


@tool
def get_saved_report(runtime) -> str:
    """Retrieve the most recent saved report."""
    store = runtime.store
    report = store.get(("reports",), "latest")
    if report:
        return f"Latest report: {report.value['summary']}"
    return "No report found"


# Initialise LLM and storage
llm = ChatOllama(model="qwen2.5-coder:latest", temperature=0.0)
store = HybridStore()

# Create the agent
agent = create_agent(
    llm,
    tools=[get_sales_data, save_report, get_saved_report],
    store=store,
    system_prompt="You are a data analyst. Help users understand their sales data."
)

# ---- Usage ---------------------------------------------------------

result = agent.invoke(
    {
        "messages": [
            {"role": "user", "content": "Analyze our sales data and give me a summary"}
        ]
    }
)

print(result["output"])

Choosing an Ollama Model

ModelSizeSpeedCapability
Qwen2.5‑Coder 1.5B1.5 B✅ Fast⚠️ Less capable
Qwen2.5‑Coder 7B7 B✅ Good balance✅ Handles most tasks
Qwen3‑Coder 30B30 B✅ Most capable⚠️ Slower
# Pull a model (example)
ollama pull qwen2.5-coder:7b

Common ChatOllama Configurations

# Deterministic, short responses
llm = ChatOllama(
    model="qwen2.5-coder:7b",
    temperature=0.0,
    num_predict=128,
)

# More creative, longer output
llm = ChatOllama(
    model="qwen2.5-coder:7b",
    temperature=0.7,
    num_predict=512,
)

# Use GPU layers (if available)
llm = ChatOllama(
    model="qwen2.5-coder:7b",
    num_gpu=35,
)

Troubleshooting

ProblemFix
Error when using the AIOpen a terminal and run ollama serve. Then run your Python script in a separate terminal.
Model not foundDownload it first: ollama pull qwen2.5-coder:latest.
CUDA out of memory / system slowdownSwitch to a smaller model, e.g. qwen2.5-coder:7b or qwen2.5-coder:1.5b.
Responses take too longUse a smaller model and limit output length, e.g.:
python<br>temperature = 0.0<br>num_predict = 128<br>

What You Can Build

  • ✅ Chat bots
  • ✅ Code assistants
  • ✅ Data‑analysis agents
  • ✅ Personal AI assistants

Resources

Happy coding! 🚀

0 views
Back to Blog

Related posts

Read more »

Building My First AI Agent

Introduction AI agents have become increasingly prominent in today’s technology sector, and their momentum shows no signs of slowing. They are now an integral...

A Guide to Fine-Tuning FunctionGemma

markdown January 16, 2026 In the world of Agentic AI, the ability to call tools is what translates natural language into executable software actions. Last month...