Hello again, here's a LangChain Ollama helper sheet :)
Source: Dev.to
LangChain + Ollama: A Practical Guide to Building AI Agents with Python
This guide teaches you how to build real, working AI agents using Ollama and LangChain.
In this guide you’ll discover:
- ✅ How to set up Ollama + LangChain (≈ 10 min)
- ✅ When to use
ollama.chat()vs.ChatOllama()(quick decision tree) - ✅ How to build agents that remember things (persistent storage)
- ✅ Real, copy‑&‑paste‑ready examples
- ✅ Performance‑tuning tips for your machine
- ✅ How to deploy to production
Decision Flowchart
flowchart TD
A[Want to use AI in your Python code?] --> B{Building a multi‑step AI agent that makes decisions and uses tools?}
B -->|YES| C[Use **ChatOllama()**]
B -->|NO| D[Use **ollama.chat()**]
C -->|✅ For agents|
C -->|✅ For tools|
C -->|✅ For state management|
C -->|✅ For production|
D -->|✅ For simple queries|
D -->|✅ For streaming|
D -->|✅ For speed|
D -->|✅ For prototyping|
Performance Benchmarks
| Operation | Typical latency | Notes |
|---|---|---|
ollama.chat() response | 15–25 ms | Fastest |
ChatOllama() response | 35–55 ms | More features |
| Streaming first token | 5–20 ms | Real‑time feedback |
| Tool execution | 2–12 ms | Overhead varies |
Tip: On a laptop with 8 GB RAM you’ll typically see responses under 100 ms. Cloud APIs usually add 500 ms+ of network latency.
ollama.chat() – Simple Queries & Streaming
When to use
You just need to ask the AI something and get an answer (or stream the answer).
Start Ollama
# Terminal 1: start the Ollama server
ollama serve
Basic request
import ollama
response = ollama.chat(
model="qwen2.5-coder:latest",
messages=[{"role": "user", "content": "What is 2 + 2?"}]
)
print(response["message"]["content"])
# → 2 + 2 equals 4
Streaming (AI “thinking” in real time)
import ollama
print("AI: ", end="", flush=True)
for chunk in ollama.chat(
model="qwen2.5-coder:latest",
messages=[{"role": "user", "content": "Write a haiku about code"}],
stream=True,
):
print(chunk["message"]["content"], end="", flush=True)
print() # newline
Output
AI: Lines of logic dance,
Bugs and fixes both take turns—
Code shapes the future.
Conversational loop (context memory)
import ollama
messages = []
while True:
user_input = input("You: ")
if not user_input:
break
# add user message
messages.append({"role": "user", "content": user_input})
# get response
response = ollama.chat(
model="qwen2.5-coder:latest",
messages=messages,
)
ai_reply = response["message"]["content"]
print(f"\nAI: {ai_reply}\n")
# add AI reply so the model remembers the context
messages.append({"role": "assistant", "content": ai_reply})
Sample conversation
You: What is a lambda function in Python?
AI: A lambda function is a small anonymous function...
You: How is it different from a regular function?
AI: Great question! The key differences are...
The AI keeps track of the conversation because we retain the full messages list.
ChatOllama() – Agents, Tools, & State Management
When to use
You need a more sophisticated setup: agents that make decisions, call tools, and maintain state.
Install required packages
pip install langchain-ollama langchain langgraph
Minimal agent that tells the time
from langchain_ollama import ChatOllama
from langchain.tools import tool
from langchain.agents import create_agent
# 1️⃣ Define a tool
@tool
def get_current_time() -> str:
"""Return the current local time as HH:MM:SS."""
from datetime import datetime
return datetime.now().strftime("%H:%M:%S")
# 2️⃣ Create the LLM wrapper
llm = ChatOllama(
model="qwen2.5-coder:latest",
temperature=0.0, # deterministic output
)
# 3️⃣ Build the agent
agent = create_agent(
llm,
tools=[get_current_time],
system_prompt="You are a helpful time assistant."
)
# 4️⃣ Invoke it
result = agent.invoke({
"messages": [{"role": "user", "content": "What time is it right now?"}]
})
print(result["output"])
# → It is currently 14:23:45
What just happened?
- You asked the agent for the time.
- The agent decided it needed the
get_current_timetool. - It called the tool, got the time, and replied politely.
The decision logic lives in the LLM; you only provide the tools.
Adding more tools (math example)
from langchain.tools import tool
@tool
def add_numbers(a: int, b: int) -> int:
"""Return a + b."""
return a + b
@tool
def multiply_numbers(a: int, b: int) -> int:
"""Return a * b."""
return a * b
# Re‑use the same LLM instance
agent = create_agent(
llm,
tools=[add_numbers, multiply_numbers, get_current_time],
system_prompt="You are a helpful math assistant."
)
# Ask a math question
result = agent.invoke({
"messages": [{"role": "user", "content": "What's 25 * 4?"}]
})
print(result["output"])
# → 25 * 4 equals 100
The agent automatically chose the multiply_numbers tool.
Persistent Memory (User Preferences, History)
from agent_workspace.hybrid_store import HybridStore
from langchain.tools import tool
# 1️⃣ Persistent storage
store = HybridStore(storage_dir="agent_workspace/storage")
# 2️⃣ Tools that interact with the store
@tool
def save_preference(key: str, value: str, runtime) -> str:
"""Persist a user preference."""
runtime.store.put(("preferences",), key, {"value": value})
return f"Saved: {key} = {value}"
@tool
def get_preference(key: str, runtime) -> str:
"""Retrieve a saved preference."""
pref = runtime.store.get(("preferences",), key)
if pref:
return f"Your {key} is {pref['value']}"
return f"No saved value for {key}"
You can now give the agent tools that read/write to a durable store, allowing it to remember user‑specific data across sessions.
Quick Recap
| Feature | ollama.chat() | ChatOllama() |
|---|---|---|
| Use case | Simple Q&A, streaming answers | Multi‑step agents, tool use, state |
| Speed | 15‑25 ms | 35‑55 ms |
| Complexity | Minimal code | Requires LangChain + tool definitions |
| When to pick | Quick prototyping, single‑turn queries | Production‑grade agents, memory, tool integration |
Happy building! 🚀
Persistent Preference Agent Example
Below is a minimal, self‑contained example that shows how to:
- Save a user preference with a custom tool.
- Retrieve the saved preference later – even after the program (or the computer) restarts.
The persistence is handled by HybridStore, which serialises the LangChain runtime store to disk.
# --------------------------------------------------------------
# Imports
# --------------------------------------------------------------
from langchain.tools import tool
from langchain.agents import create_agent
from langchain_ollama import ChatOllama
from agent_workspace.hybrid_store import HybridStore
# --------------------------------------------------------------
# Tools
# --------------------------------------------------------------
@tool
def save_preference(key: str, value: str, runtime) -> str:
"""
Save a user preference.
Parameters
----------
key : str
Identifier for the preference (e.g. "favorite_color").
value : str
The value to store.
runtime : Runtime
The LangChain runtime that provides access to the store.
Returns
-------
str
Confirmation message.
"""
store = runtime.store
# Store the value under the namespace ("preferences", key)
store.put(("preferences", key), "value", {"value": value})
return f"Preference saved: {key} = {value}"
@tool
def get_preference(key: str, runtime) -> str:
"""
Retrieve a saved preference.
Parameters
----------
key : str
Identifier for the preference to fetch.
runtime : Runtime
The LangChain runtime that provides access to the store.
Returns
-------
str
The stored value or a “not found” message.
"""
store = runtime.store
pref = store.get(("preferences", key), "value")
if pref:
return f"{key} is: {pref.value['value']}"
return "No preference found"
# --------------------------------------------------------------
# Agent setup
# --------------------------------------------------------------
llm = ChatOllama(model="qwen2.5-coder:latest", temperature=0.0)
store = HybridStore() # Persistent storage backend
agent = create_agent(
llm,
tools=[save_preference, get_preference],
store=store, # Connect the storage to the agent
system_prompt="You help manage user preferences."
)
# --------------------------------------------------------------
# Sessions (demonstration)
# --------------------------------------------------------------
# Session 1 – Save a preference
print("=== Session 1 ===")
result1 = agent.invoke({
"messages": [{"role": "user", "content": "Remember that my favorite color is blue"}]
})
print(result1["output"])
# Session 2 – Retrieve the preference (even after a restart!)
print("\n=== Session 2 (After Restart) ===")
result2 = agent.invoke({
"messages": [{"role": "user", "content": "What's my favorite color?"}]
})
print(result2["output"])
# Expected output: "Your favorite color is: blue"
The magic:
HybridStore(from the MagicPython library) automatically writes the runtime store to a file.
Consequently, any data saved in Session 1 remains available in Session 2, regardless of whether the Python process—or the entire computer—has been restarted.
Simple Python‑Code Assistant
A minimal example that shows how to create a LangChain agent equipped with two tools:
check_python_syntax– validates Python code syntax.explain_code– returns a placeholder explanation (replace with an LLM call in a real app).
from langchain.tools import tool
from langchain.agents import create_agent
from langchain_ollama import ChatOllama
@tool
def check_python_syntax(code: str) -> str:
"""Check if the supplied Python code is syntactically valid."""
try:
compile(code, "", "exec")
return "✅ Syntax is valid!"
except SyntaxError as e:
return f"❌ Syntax error: {e}"
@tool
def explain_code(code: str) -> str:
"""Provide a simple explanation of what the code does."""
# In a real application you would call an LLM here.
return "This code does X, Y, and Z"
# Initialise the LLM (Ollama model) – temperature set to 0 for deterministic output.
llm = ChatOllama(model="qwen2.5-coder:latest", temperature=0.0)
# Build the agent with the two tools defined above.
agent = create_agent(
llm,
tools=[check_python_syntax, explain_code],
system_prompt=(
"You are a Python code assistant. Help the user write and understand code."
),
)
# ---- Usage ---------------------------------------------------------
code = """
def greet(name):
print(f"Hello, {name}!")
"""
# Ask the agent to validate the snippet.
result = agent.invoke(
{
"messages": [
{
"role": "user",
"content": f"Is this Python code valid?\n\n{code}",
}
]
}
)
print(result["output"])
# Example output: "✅ Syntax is valid!"
Data‑Analysis Agent with Persistent Reports
import json
from langchain.tools import tool
from langchain.agents import create_agent
from langchain_ollama import ChatOllama
from agent_workspace.hybrid_store import HybridStore
# Sample sales data
SALES_DATA = [
{"product": "Laptop", "sales": 15},
{"product": "Phone", "sales": 42},
{"product": "Tablet", "sales": 28},
{"product": "Headphones", "sales": 35},
]
@tool
def get_sales_data() -> str:
"""Return the latest sales data as JSON."""
return json.dumps(SALES_DATA)
@tool
def save_report(summary: str, runtime) -> str:
"""Save an analysis report."""
store = runtime.store
store.put(("reports",), "latest", {"summary": summary})
return "Report saved!"
@tool
def get_saved_report(runtime) -> str:
"""Retrieve the most recent saved report."""
store = runtime.store
report = store.get(("reports",), "latest")
if report:
return f"Latest report: {report.value['summary']}"
return "No report found"
# Initialise LLM and storage
llm = ChatOllama(model="qwen2.5-coder:latest", temperature=0.0)
store = HybridStore()
# Create the agent
agent = create_agent(
llm,
tools=[get_sales_data, save_report, get_saved_report],
store=store,
system_prompt="You are a data analyst. Help users understand their sales data."
)
# ---- Usage ---------------------------------------------------------
result = agent.invoke(
{
"messages": [
{"role": "user", "content": "Analyze our sales data and give me a summary"}
]
}
)
print(result["output"])
Choosing an Ollama Model
| Model | Size | Speed | Capability |
|---|---|---|---|
| Qwen2.5‑Coder 1.5B | 1.5 B | ✅ Fast | ⚠️ Less capable |
| Qwen2.5‑Coder 7B | 7 B | ✅ Good balance | ✅ Handles most tasks |
| Qwen3‑Coder 30B | 30 B | ✅ Most capable | ⚠️ Slower |
# Pull a model (example)
ollama pull qwen2.5-coder:7b
Common ChatOllama Configurations
# Deterministic, short responses
llm = ChatOllama(
model="qwen2.5-coder:7b",
temperature=0.0,
num_predict=128,
)
# More creative, longer output
llm = ChatOllama(
model="qwen2.5-coder:7b",
temperature=0.7,
num_predict=512,
)
# Use GPU layers (if available)
llm = ChatOllama(
model="qwen2.5-coder:7b",
num_gpu=35,
)
Troubleshooting
| Problem | Fix |
|---|---|
| Error when using the AI | Open a terminal and run ollama serve. Then run your Python script in a separate terminal. |
| Model not found | Download it first: ollama pull qwen2.5-coder:latest. |
| CUDA out of memory / system slowdown | Switch to a smaller model, e.g. qwen2.5-coder:7b or qwen2.5-coder:1.5b. |
| Responses take too long | Use a smaller model and limit output length, e.g.: python<br>temperature = 0.0<br>num_predict = 128<br> |
What You Can Build
- ✅ Chat bots
- ✅ Code assistants
- ✅ Data‑analysis agents
- ✅ Personal AI assistants
Resources
Happy coding! 🚀