Using Ollama Web Search API in Python
Source: Dev.to
Getting Started
Installation
Install version 0.6.0 or higher:
pip install 'ollama>=0.6.0'
Environment setup
For managing Python environments and packages, consider using uv or a virtual environment with venv.
Create an API key from your Ollama account and set it as an environment variable:
export OLLAMA_API_KEY="your_api_key"
On Windows PowerShell:
$env:OLLAMA_API_KEY = "your_api_key"
Basic Web Search
import ollama
# Simple web search
response = ollama.web_search("What is Ollama?")
print(response)
Sample output
results = [
{
"title": "Ollama",
"url": "https://ollama.com/",
"content": "Cloud models are now available in Ollama..."
},
{
"title": "What is Ollama? Features, Pricing, and Use Cases",
"url": "https://www.walturn.com/insights/what-is-ollama",
"content": "Our services..."
},
{
"title": "Complete Ollama Guide: Installation, Usage & Code Examples",
"url": "https://collabnix.com/complete-ollama-guide",
"content": "Join our Discord Server..."
}
]
Controlling Result Count
import ollama
# Get more results
response = ollama.web_search("latest AI news", max_results=10)
for result in response.results:
print(f"📌 {result.title}")
print(f" {result.url}")
print(f" {result.content[:100]}...")
print()
Fetching Full Page Content
web_search returns multiple search results with titles, URLs, and snippets.
web_fetch retrieves the full content of a specific URL, returning the page title, markdown content, and links.
from ollama import web_fetch
result = web_fetch('https://ollama.com')
print(result)
Sample output
WebFetchResponse(
title='Ollama',
content='[Cloud models](https://ollama.com/blog/cloud-models) are now available in Ollama\n\n**Chat & build with open models**\n\n[Download](https://ollama.com/download) [Explore models](https://ollama.com/models)\n\nAvailable for macOS, Windows, and Linux',
links=['https://ollama.com/', 'https://ollama.com/models', 'https://github.com/ollama/ollama']
)
For more on converting HTML to Markdown, see our guide on converting HTML to Markdown with Python.
Combining Search and Fetch
A common pattern is to search first, then fetch full content from relevant results:
from ollama import web_search, web_fetch
# Search for information
search_results = web_search("Ollama new features 2025")
# Fetch full content from the first result
if search_results.results:
first_url = search_results.results[0].url
full_content = web_fetch(first_url)
print(f"Title: {full_content.title}")
print(f"Content: {full_content.content[:500]}...")
print(f"Links found: {len(full_content.links)}")
Building a Search Agent
Models with strong tool‑use capabilities work best, such as qwen3, gpt-oss, and cloud models like qwen3:480b-cloud and deepseek-v3.1-cloud. See the guide on LLMs with Structured Output using Ollama and Qwen3 for advanced use cases.
First, pull a capable model:
ollama pull qwen3:4b
Simple Search Agent
A basic search agent that can autonomously decide when to search:
from ollama import chat, web_fetch, web_search
available_tools = {'web_search': web_search, 'web_fetch': web_fetch}
messages = [{'role': 'user', 'content': "what is ollama's new engine"}]
while True:
response = chat(
model='qwen3:4b',
messages=messages,
tools=[web_search, web_fetch],
think=True
)
if response.message.thinking:
print('🧠 Thinking:', response.message.thinking[:200], '...')
if response.message.content:
print('💬 Response:', response.message.content)
messages.append(response.message)
if response.message.tool_calls:
print('🔧 Tool calls:', response.message.tool_calls)
for tool_call in response.message.tool_calls:
function_to_call = available_tools.get(tool_call.function.name)
if function_to_call:
args = tool_call.function.arguments
result = function_to_call(**args)
print('📥 Result:', str(result)[:200], '...')
# Truncate result for limited context lengths
messages.append({
'role': 'tool',
'content': str(result)[:2000 * 4],
'tool_name': tool_call.function.name
})
else:
messages.append({
'role': 'tool',
'content': f'Tool {tool_call.function.name} not found',
'tool_name': tool_call.function.name
})
else:
break
Handling large web search results
Truncate results to fit context limits (≈ 8000 characters, i.e., 2000 tokens × 4 chars) before passing them to the model.
Advanced Search Agent with Error Handling
An enhanced version with better error handling:
from ollama import chat, web_fetch, web_search
import json
class SearchAgent:
def __init__(self, model: str = 'qwen3:4b'):
self.model = model
self.tools = {'web_search': web_search, 'web_fetch': web_fetch}
self.messages = []
self.max_iterations = 10
def query(self, question: str) -> str:
self.messages = [{'role': 'user', 'content': question}]
for iteration in range(self.max_iterations):
try:
response = chat(
model=self.model,
messages=self.messages,
tools=[web_search, web_fetch],
think=True
)
except Exception as e:
return f"Error during chat: {e}"
self.messages.append(response.message)
# If no tool calls, we have a final answer
if not response.message.tool_calls:
return