How to Stream AI Agent Responses in 5 Min

Published: (March 16, 2026 at 05:01 PM EDT)
3 min read
Source: Dev.to

Source: Dev.to

The Code

import asyncio
from agents import Agent, Runner, function_tool
from openai.types.responses import ResponseTextDeltaEvent

@function_tool
def lookup_price(ticker: str) -> str:
    """Look up the current price of a stock."""
    prices = {"AAPL": "$198.50", "GOOG": "$176.30", "TSLA": "$245.10"}
    return prices.get(ticker.upper(), f"No data for {ticker}")

agent = Agent(
    name="StockAssistant",
    instructions="You help users check stock prices. Use the lookup_price tool.",
    tools=[lookup_price],
)

async def main():
    result = Runner.run_streamed(agent, input="What's the price of AAPL and GOOG?")

    async for event in result.stream_events():
        if event.type == "raw_response_event" and isinstance(event.data, ResponseTextDeltaEvent):
            print(event.data.delta, end="", flush=True)
        elif event.type == "run_item_stream_event":
            if event.item.type == "tool_call_item":
                print("\n>> Calling tool...")
            elif event.item.type == "tool_call_output_item":
                print(f">> Tool returned: {event.item.output}")
            elif event.item.type == "message_output_item":
                pass  # Already streaming via raw events above

    print()  # Final newline

if __name__ == "__main__":
    asyncio.run(main())

Install and Run

pip install openai-agents
export OPENAI_API_KEY="sk-..."
python stream_agent.py

What You’ll See

>> Calling tool...
>> Tool returned: $198.50
>> Calling tool...
>> Tool returned: $176.30
Apple (AAPL) is currently at $198.50 and Alphabet (GOOG) is at $176.30. Both are showing...

Tokens appear one by one as the agent generates them. Tool calls show up the moment they happen — not after the entire run completes.

How It Works

Runner.run_streamed() replaces Runner.run(). Instead of blocking until the agent finishes, it returns a result object immediately. You consume events from result.stream_events() as an async iterator.

Event Types

  • raw_response_event – fires for every token the LLM generates. Filter for ResponseTextDeltaEvent to grab the actual text deltas. Print them with end="" and flush=True for a token‑by‑token effect.
  • run_item_stream_event – fires when a complete item is generated (tool call, tool output, or finished message). Use this to show “Calling tool…” progress indicators.
  • agent_updated_stream_event – fires when the current agent changes (during handoffs). It can be ignored for single‑agent setups.

The key insight: raw events give you real‑time text, while item events give you structured milestones. Use both together for the best UX.

Streaming vs. Blocking: The Difference

  • Blocking (Runner.run()) – a two‑tool‑call agent produces 10–15 seconds of silence, then the complete response.
  • Streaming (Runner.run_streamed()) – users see tool calls within 1–2 seconds and text appearing token‑by‑token immediately after.

For CLI apps, the code above works as‑is. For web apps, replace print() with writes to a Server‑Sent Events stream or WebSocket.

Quick Tips

  • Don’t mix streaming with .final_output. result.final_output is only available after the stream is fully consumed. Read events first, then access it.
  • Handle None deltas. Some chunks have empty deltas — the isinstance check filters those out.
  • Tool approval works too. If a tool requires human approval, the stream pauses at that point. Check result.interruptions after the stream ends.

Next Steps

Combine streaming with the patterns from earlier articles in this series:

Building agents that need streaming, tools, and orchestration out of the box? Nebula handles the infrastructure so you can focus on the logic.

0 views
Back to Blog

Related posts

Read more »

AnswerThis (YC F25) Is Hiring

Who we are Trillions of dollars flow into global R&D every year, and a massive share of it goes to researchers manually reading papers, writing literature revi...