Hello again, here's a LangChain Ollama helper sheet :)

Published: 3 days ago (February 9, 2026 at 10:19 AM EST)

10 min read

Source: Dev.to

Source: Dev.to

LangChain + Ollama: A Practical Guide to Building AI Agents with Python

This guide teaches you how to build real, working AI agents using Ollama and LangChain.

In this guide you’ll discover:

✅ How to set up Ollama + LangChain (≈ 10 min)
✅ When to use ollama.chat() vs. ChatOllama() (quick decision tree)
✅ How to build agents that remember things (persistent storage)
✅ Real, copy‑&‑paste‑ready examples
✅ Performance‑tuning tips for your machine
✅ How to deploy to production

Decision Flowchart

flowchart TD
    A[Want to use AI in your Python code?] --> B{Building a multi‑step AI agent that makes decisions and uses tools?}
    B -->|YES| C[Use **ChatOllama()**]
    B -->|NO|  D[Use **ollama.chat()**]

    C -->|✅ For agents|
    C -->|✅ For tools|
    C -->|✅ For state management|
    C -->|✅ For production|

    D -->|✅ For simple queries|
    D -->|✅ For streaming|
    D -->|✅ For speed|
    D -->|✅ For prototyping|

Performance Benchmarks

Operation	Typical latency	Notes
`ollama.chat()` response	15–25 ms	Fastest
`ChatOllama()` response	35–55 ms	More features
Streaming first token	5–20 ms	Real‑time feedback
Tool execution	2–12 ms	Overhead varies

Tip: On a laptop with 8 GB RAM you’ll typically see responses under 100 ms. Cloud APIs usually add 500 ms+ of network latency.

`ollama.chat()` – Simple Queries & Streaming

When to use

You just need to ask the AI something and get an answer (or stream the answer).

Start Ollama

# Terminal 1: start the Ollama server
ollama serve

Basic request

import ollama

response = ollama.chat(
    model="qwen2.5-coder:latest",
    messages=[{"role": "user", "content": "What is 2 + 2?"}]
)

print(response["message"]["content"])
# → 2 + 2 equals 4

Streaming (AI “thinking” in real time)

import ollama

print("AI: ", end="", flush=True)

for chunk in ollama.chat(
    model="qwen2.5-coder:latest",
    messages=[{"role": "user", "content": "Write a haiku about code"}],
    stream=True,
):
    print(chunk["message"]["content"], end="", flush=True)

print()   # newline

Output

AI: Lines of logic dance,
Bugs and fixes both take turns—
Code shapes the future.

Conversational loop (context memory)

import ollama

messages = []

while True:
    user_input = input("You: ")
    if not user_input:
        break

    # add user message
    messages.append({"role": "user", "content": user_input})

    # get response
    response = ollama.chat(
        model="qwen2.5-coder:latest",
        messages=messages,
    )
    ai_reply = response["message"]["content"]
    print(f"\nAI: {ai_reply}\n")

    # add AI reply so the model remembers the context
    messages.append({"role": "assistant", "content": ai_reply})

Sample conversation

You: What is a lambda function in Python?
AI: A lambda function is a small anonymous function...

You: How is it different from a regular function?
AI: Great question! The key differences are...

The AI keeps track of the conversation because we retain the full messages list.

`ChatOllama()` – Agents, Tools, & State Management

When to use

You need a more sophisticated setup: agents that make decisions, call tools, and maintain state.

Install required packages

pip install langchain-ollama langchain langgraph

Minimal agent that tells the time

from langchain_ollama import ChatOllama
from langchain.tools import tool
from langchain.agents import create_agent

# 1️⃣ Define a tool
@tool
def get_current_time() -> str:
    """Return the current local time as HH:MM:SS."""
    from datetime import datetime
    return datetime.now().strftime("%H:%M:%S")

# 2️⃣ Create the LLM wrapper
llm = ChatOllama(
    model="qwen2.5-coder:latest",
    temperature=0.0,          # deterministic output
)

# 3️⃣ Build the agent
agent = create_agent(
    llm,
    tools=[get_current_time],
    system_prompt="You are a helpful time assistant."
)

# 4️⃣ Invoke it
result = agent.invoke({
    "messages": [{"role": "user", "content": "What time is it right now?"}]
})

print(result["output"])
# → It is currently 14:23:45

What just happened?

You asked the agent for the time.
The agent decided it needed the get_current_time tool.
It called the tool, got the time, and replied politely.

The decision logic lives in the LLM; you only provide the tools.

Adding more tools (math example)

from langchain.tools import tool

@tool
def add_numbers(a: int, b: int) -> int:
    """Return a + b."""
    return a + b

@tool
def multiply_numbers(a: int, b: int) -> int:
    """Return a * b."""
    return a * b

# Re‑use the same LLM instance
agent = create_agent(
    llm,
    tools=[add_numbers, multiply_numbers, get_current_time],
    system_prompt="You are a helpful math assistant."
)

# Ask a math question
result = agent.invoke({
    "messages": [{"role": "user", "content": "What's 25 * 4?"}]
})

print(result["output"])
# → 25 * 4 equals 100

The agent automatically chose the multiply_numbers tool.

Persistent Memory (User Preferences, History)

from agent_workspace.hybrid_store import HybridStore
from langchain.tools import tool

# 1️⃣ Persistent storage
store = HybridStore(storage_dir="agent_workspace/storage")

# 2️⃣ Tools that interact with the store
@tool
def save_preference(key: str, value: str, runtime) -> str:
    """Persist a user preference."""
    runtime.store.put(("preferences",), key, {"value": value})
    return f"Saved: {key} = {value}"


@tool
def get_preference(key: str, runtime) -> str:
    """Retrieve a saved preference."""
    pref = runtime.store.get(("preferences",), key)
    if pref:
        return f"Your {key} is {pref['value']}"
    return f"No saved value for {key}"

You can now give the agent tools that read/write to a durable store, allowing it to remember user‑specific data across sessions.

Quick Recap

Feature	`ollama.chat()`	`ChatOllama()`
Use case	Simple Q&A, streaming answers	Multi‑step agents, tool use, state
Speed	15‑25 ms	35‑55 ms
Complexity	Minimal code	Requires LangChain + tool definitions
When to pick	Quick prototyping, single‑turn queries	Production‑grade agents, memory, tool integration

Happy building! 🚀

Persistent Preference Agent Example

Below is a minimal, self‑contained example that shows how to:

Save a user preference with a custom tool.
Retrieve the saved preference later – even after the program (or the computer) restarts.

The persistence is handled by HybridStore, which serialises the LangChain runtime store to disk.

# --------------------------------------------------------------
# Imports
# --------------------------------------------------------------
from langchain.tools import tool
from langchain.agents import create_agent
from langchain_ollama import ChatOllama
from agent_workspace.hybrid_store import HybridStore

# --------------------------------------------------------------
# Tools
# --------------------------------------------------------------
@tool
def save_preference(key: str, value: str, runtime) -> str:
    """
    Save a user preference.

    Parameters
    ----------
    key : str
        Identifier for the preference (e.g. "favorite_color").
    value : str
        The value to store.
    runtime : Runtime
        The LangChain runtime that provides access to the store.

    Returns
    -------
    str
        Confirmation message.
    """
    store = runtime.store
    # Store the value under the namespace ("preferences", key)
    store.put(("preferences", key), "value", {"value": value})
    return f"Preference saved: {key} = {value}"


@tool
def get_preference(key: str, runtime) -> str:
    """
    Retrieve a saved preference.

    Parameters
    ----------
    key : str
        Identifier for the preference to fetch.
    runtime : Runtime
        The LangChain runtime that provides access to the store.

    Returns
    -------
    str
        The stored value or a “not found” message.
    """
    store = runtime.store
    pref = store.get(("preferences", key), "value")
    if pref:
        return f"{key} is: {pref.value['value']}"
    return "No preference found"


# --------------------------------------------------------------
# Agent setup
# --------------------------------------------------------------
llm = ChatOllama(model="qwen2.5-coder:latest", temperature=0.0)
store = HybridStore()                     # Persistent storage backend

agent = create_agent(
    llm,
    tools=[save_preference, get_preference],
    store=store,                           # Connect the storage to the agent
    system_prompt="You help manage user preferences."
)


# --------------------------------------------------------------
# Sessions (demonstration)
# --------------------------------------------------------------
# Session 1 – Save a preference
print("=== Session 1 ===")
result1 = agent.invoke({
    "messages": [{"role": "user", "content": "Remember that my favorite color is blue"}]
})
print(result1["output"])

# Session 2 – Retrieve the preference (even after a restart!)
print("\n=== Session 2 (After Restart) ===")
result2 = agent.invoke({
    "messages": [{"role": "user", "content": "What's my favorite color?"}]
})
print(result2["output"])
# Expected output: "Your favorite color is: blue"

The magic:
HybridStore (from the MagicPython library) automatically writes the runtime store to a file.
Consequently, any data saved in Session 1 remains available in Session 2, regardless of whether the Python process—or the entire computer—has been restarted.

Simple Python‑Code Assistant

A minimal example that shows how to create a LangChain agent equipped with two tools:

check_python_syntax – validates Python code syntax.
explain_code – returns a placeholder explanation (replace with an LLM call in a real app).

from langchain.tools import tool
from langchain.agents import create_agent
from langchain_ollama import ChatOllama


@tool
def check_python_syntax(code: str) -> str:
    """Check if the supplied Python code is syntactically valid."""
    try:
        compile(code, "", "exec")
        return "✅ Syntax is valid!"
    except SyntaxError as e:
        return f"❌ Syntax error: {e}"


@tool
def explain_code(code: str) -> str:
    """Provide a simple explanation of what the code does."""
    # In a real application you would call an LLM here.
    return "This code does X, Y, and Z"


# Initialise the LLM (Ollama model) – temperature set to 0 for deterministic output.
llm = ChatOllama(model="qwen2.5-coder:latest", temperature=0.0)

# Build the agent with the two tools defined above.
agent = create_agent(
    llm,
    tools=[check_python_syntax, explain_code],
    system_prompt=(
        "You are a Python code assistant. Help the user write and understand code."
    ),
)


# ---- Usage ---------------------------------------------------------

code = """
def greet(name):
    print(f"Hello, {name}!")
"""

# Ask the agent to validate the snippet.
result = agent.invoke(
    {
        "messages": [
            {
                "role": "user",
                "content": f"Is this Python code valid?\n\n{code}",
            }
        ]
    }
)

print(result["output"])
# Example output: "✅ Syntax is valid!"

Data‑Analysis Agent with Persistent Reports

import json
from langchain.tools import tool
from langchain.agents import create_agent
from langchain_ollama import ChatOllama
from agent_workspace.hybrid_store import HybridStore

# Sample sales data
SALES_DATA = [
    {"product": "Laptop",     "sales": 15},
    {"product": "Phone",      "sales": 42},
    {"product": "Tablet",     "sales": 28},
    {"product": "Headphones", "sales": 35},
]

@tool
def get_sales_data() -> str:
    """Return the latest sales data as JSON."""
    return json.dumps(SALES_DATA)


@tool
def save_report(summary: str, runtime) -> str:
    """Save an analysis report."""
    store = runtime.store
    store.put(("reports",), "latest", {"summary": summary})
    return "Report saved!"


@tool
def get_saved_report(runtime) -> str:
    """Retrieve the most recent saved report."""
    store = runtime.store
    report = store.get(("reports",), "latest")
    if report:
        return f"Latest report: {report.value['summary']}"
    return "No report found"


# Initialise LLM and storage
llm = ChatOllama(model="qwen2.5-coder:latest", temperature=0.0)
store = HybridStore()

# Create the agent
agent = create_agent(
    llm,
    tools=[get_sales_data, save_report, get_saved_report],
    store=store,
    system_prompt="You are a data analyst. Help users understand their sales data."
)

# ---- Usage ---------------------------------------------------------

result = agent.invoke(
    {
        "messages": [
            {"role": "user", "content": "Analyze our sales data and give me a summary"}
        ]
    }
)

print(result["output"])

Choosing an Ollama Model

Model	Size	Speed	Capability
Qwen2.5‑Coder 1.5B	1.5 B	✅ Fast	⚠️ Less capable
Qwen2.5‑Coder 7B	7 B	✅ Good balance	✅ Handles most tasks
Qwen3‑Coder 30B	30 B	✅ Most capable	⚠️ Slower

# Pull a model (example)
ollama pull qwen2.5-coder:7b

Common `ChatOllama` Configurations

# Deterministic, short responses
llm = ChatOllama(
    model="qwen2.5-coder:7b",
    temperature=0.0,
    num_predict=128,
)

# More creative, longer output
llm = ChatOllama(
    model="qwen2.5-coder:7b",
    temperature=0.7,
    num_predict=512,
)

# Use GPU layers (if available)
llm = ChatOllama(
    model="qwen2.5-coder:7b",
    num_gpu=35,
)

Troubleshooting

Problem	Fix
Error when using the AI	Open a terminal and run `ollama serve`. Then run your Python script in a separate terminal.
Model not found	Download it first: `ollama pull qwen2.5-coder:latest`.
CUDA out of memory / system slowdown	Switch to a smaller model, e.g. `qwen2.5-coder:7b` or `qwen2.5-coder:1.5b`.
Responses take too long	Use a smaller model and limit output length, e.g.: `python<br>temperature = 0.0<br>num_predict = 128<br>`

What You Can Build

✅ Chat bots
✅ Code assistants
✅ Data‑analysis agents
✅ Personal AI assistants

Resources

Happy coding! 🚀

Hello again, here's a LangChain Ollama helper sheet :)

LangChain + Ollama: A Practical Guide to Building AI Agents with Python

In this guide you’ll discover:

Decision Flowchart

Performance Benchmarks

`ollama.chat()` – Simple Queries & Streaming

When to use

Start Ollama

Basic request

Streaming (AI “thinking” in real time)

Conversational loop (context memory)

`ChatOllama()` – Agents, Tools, & State Management

When to use

Install required packages

Minimal agent that tells the time

Adding more tools (math example)

Persistent Memory (User Preferences, History)

Quick Recap

Persistent Preference Agent Example

Simple Python‑Code Assistant

Data‑Analysis Agent with Persistent Reports

Choosing an Ollama Model

Common `ChatOllama` Configurations

Troubleshooting

What You Can Build

Resources

Related posts

Building My First AI Agent

A Guide to Fine-Tuning FunctionGemma

I Built a Feedback Loop That Coaches LLMs at Runtime Using NumPy

Why Your “Skill Scanner” Is Just False Security (and Maybe Malware)

LangChain + Ollama: A Practical Guide to Building AI Agents with Python

In this guide you’ll discover:

Decision Flowchart

Performance Benchmarks

ollama.chat() – Simple Queries & Streaming

When to use

Start Ollama

Basic request

Streaming (AI “thinking” in real time)

Conversational loop (context memory)

ChatOllama() – Agents, Tools, & State Management

When to use

Install required packages

Minimal agent that tells the time

Adding more tools (math example)

Persistent Memory (User Preferences, History)

Quick Recap

Persistent Preference Agent Example

Simple Python‑Code Assistant

Data‑Analysis Agent with Persistent Reports

Choosing an Ollama Model

Common ChatOllama Configurations

Troubleshooting

What You Can Build

Resources

Related posts

Building My First AI Agent

A Guide to Fine-Tuning FunctionGemma

I Built a Feedback Loop That Coaches LLMs at Runtime Using NumPy

Why Your “Skill Scanner” Is Just False Security (and Maybe Malware)

LangChain + Ollama: A Practical Guide to Building AI Agents with Python

`ollama.chat()` – Simple Queries & Streaming

`ChatOllama()` – Agents, Tools, & State Management

Common `ChatOllama` Configurations