How to Stream AI Agent Responses in 5 Min
Source: Dev.to
The Code
import asyncio
from agents import Agent, Runner, function_tool
from openai.types.responses import ResponseTextDeltaEvent
@function_tool
def lookup_price(ticker: str) -> str:
"""Look up the current price of a stock."""
prices = {"AAPL": "$198.50", "GOOG": "$176.30", "TSLA": "$245.10"}
return prices.get(ticker.upper(), f"No data for {ticker}")
agent = Agent(
name="StockAssistant",
instructions="You help users check stock prices. Use the lookup_price tool.",
tools=[lookup_price],
)
async def main():
result = Runner.run_streamed(agent, input="What's the price of AAPL and GOOG?")
async for event in result.stream_events():
if event.type == "raw_response_event" and isinstance(event.data, ResponseTextDeltaEvent):
print(event.data.delta, end="", flush=True)
elif event.type == "run_item_stream_event":
if event.item.type == "tool_call_item":
print("\n>> Calling tool...")
elif event.item.type == "tool_call_output_item":
print(f">> Tool returned: {event.item.output}")
elif event.item.type == "message_output_item":
pass # Already streaming via raw events above
print() # Final newline
if __name__ == "__main__":
asyncio.run(main())Install and Run
pip install openai-agents
export OPENAI_API_KEY="sk-..."
python stream_agent.pyWhat You’ll See
>> Calling tool...
>> Tool returned: $198.50
>> Calling tool...
>> Tool returned: $176.30
Apple (AAPL) is currently at $198.50 and Alphabet (GOOG) is at $176.30. Both are showing...Tokens appear one by one as the agent generates them. Tool calls show up the moment they happen — not after the entire run completes.
How It Works
Runner.run_streamed() replaces Runner.run(). Instead of blocking until the agent finishes, it returns a result object immediately. You consume events from result.stream_events() as an async iterator.
Event Types
raw_response_event– fires for every token the LLM generates. Filter forResponseTextDeltaEventto grab the actual text deltas. Print them withend=""andflush=Truefor a token‑by‑token effect.run_item_stream_event– fires when a complete item is generated (tool call, tool output, or finished message). Use this to show “Calling tool…” progress indicators.agent_updated_stream_event– fires when the current agent changes (during handoffs). It can be ignored for single‑agent setups.
The key insight: raw events give you real‑time text, while item events give you structured milestones. Use both together for the best UX.
Streaming vs. Blocking: The Difference
- Blocking (
Runner.run()) – a two‑tool‑call agent produces 10–15 seconds of silence, then the complete response. - Streaming (
Runner.run_streamed()) – users see tool calls within 1–2 seconds and text appearing token‑by‑token immediately after.
For CLI apps, the code above works as‑is. For web apps, replace print() with writes to a Server‑Sent Events stream or WebSocket.
Quick Tips
- Don’t mix streaming with
.final_output.result.final_outputis only available after the stream is fully consumed. Read events first, then access it. - Handle
Nonedeltas. Some chunks have empty deltas — theisinstancecheck filters those out. - Tool approval works too. If a tool requires human approval, the stream pauses at that point. Check
result.interruptionsafter the stream ends.
Next Steps
Combine streaming with the patterns from earlier articles in this series:
Building agents that need streaming, tools, and orchestration out of the box? Nebula handles the infrastructure so you can focus on the logic.