When a Model Gets Stuck: How GPT‑5.2 Finished a 'Simple' Spinner That Opus 4.5 Couldn't

Published: 1 week ago (January 6, 2026 at 11:54 AM EST)

7 min read

Source: Dev.to

Feature Request: “This should be simple”

“Sometimes the LLM response stalls mid‑sentence. Show a basic spinner when that happens, and remove it when more text arrives.”

In a web UI a stall is annoying; in a terminal it looks like a crash.
What we needed was a subtle ⋯ waiting for more indicator—nothing flashy, just a clear signal that the system is alive and waiting.

The Scene: Simulated Streaming Meets Real‑World Stalls

Aye Chat streams responses into a terminal UI using Rich (Live).
We also simulate streaming: even if a provider returns large chunks, we animate the output word‑by‑word so it feels like “typing”.

Real providers behave like this

You receive some tokens.
There’s a gap (LLM is thinking, server‑side pause, back‑pressure).
Streaming resumes.

If that gap occurs mid‑sentence, the user freezes.

Four deceptively clean requirements

Detect a stall while streaming.
Show ⋯ waiting for more only during stalls.
Remove it immediately when new text arrives.
Don’t break Markdown or final formatting.

Architecture (and Why It Can Lie to You)

At a high level the streaming UI has three moving parts:

Component	Role
`update(content: str)`	Called when new streamed content arrives (full accumulated content, not a delta).
`_animate_words(new_text: str)`	Prints newly received text word‑by‑word with a small delay.
Background monitor thread	Periodically decides whether we are “stalled”.

Rendering is done via a helper like:

def _create_response_panel(
    content: str,
    use_markdown: bool = True,
    show_stall_indicator: bool = False,
) -> Panel:
    # …build a Rich Panel…
    if show_stall_indicator:
        content += "\n⋯ waiting for more"
    return Panel(content)

When show_stall_indicator=True, the spinner line is appended.

What “Stalled” Actually Means

There are two kinds of stalls:

Network stall – no new content is arriving from the LLM.
User‑visible stall – nothing is changing on screen (the UI has caught up to the latest data).

These are not the same in a system that deliberately delays rendering.

Where Opus 4.5 Got Stuck: Fixing Symptoms Instead of the Machine

Claude Opus 4.5 tackled the first part quickly:

add a timestamp,
monitor elapsed time,
show the indicator if a threshold is exceeded.

It worked… until it didn’t. The spinner blinked briefly even while words were still printing.

Why?
The stall detector was looking at time since the last network update, while the UI was still busy animating buffered words. Opus tried to add an _is_animating flag to suppress the indicator, but the problem persisted.

The real issue was two concurrent writers to the same UI:

The animation path calls Live.update() as it prints words.
The monitor thread calls Live.update() as it toggles the indicator.

Without serialization, an inconsistent intermediate frame is rendered, which appears as a blinking spinner.

Opus was stuck in a local optimum:

treat it as a timing problem,
treat it as a single boolean,
keep adding guards.

What we actually needed was state + synchronization.

What GPT‑5.2 Did Differently: Treat It Like a State Machine with a Single Renderer

GPT‑5.2 didn’t win by being clever; it won by being strict. It introduced three decisive changes.

1️⃣ Serialize Shared State and All UI Updates

Create a lock:

self._lock = threading.RLock()

Rule: Any code that touches shared state or calls Live.update() must hold the lock.

Centralize rendering in a single helper:

def _refresh_display(
    self,
    use_markdown: bool = False,
    show_stall: bool = False,
) -> None:
    with self._lock:
        if not self._live:
            return

        self._live.update(
            _create_response_panel(
                self._animated_content,
                use_markdown=use_markdown,
                show_stall_indicator=show_stall,
            )
        )
        self._showing_stall_indicator = show_stall

Now only one thread can modify the UI at a time, eliminating the “blinking because two threads fought for the frame buffer” class of bugs.

2️⃣ Redefine “Stall” as “Caught Up and No New Input”

A stall should only be possible when:

We are not currently animating, and
The animated output has caught up to what we have received.

caught_up = (not self._is_animating) and (
    self._animated_content == self._current_content
)

If caught_up is True and a configurable timeout has elapsed, we consider the system stalled and show the indicator.

3️⃣ Keep the Indicator Purely UI‑Side

The monitor thread now only sets a desired stall state; the actual rendering decision lives in _refresh_display. This separation guarantees that the spinner appears exactly when the UI is idle and disappears the moment new text arrives.

def _monitor_stall(self):
    while self._running:
        time.sleep(self._check_interval)
        with self._lock:
            if self._should_show_stall():
                self._refresh_display(show_stall=True)
            else:
                self._refresh_display(show_stall=False)

Result: A Boring‑But‑Correct Spinner

No flickering.
No Markdown corruption.
Accurate “waiting for more” signal that only shows when the UI is truly idle.

The whole episode turned a “simple” feature request into a lesson about state machines, synchronization, and the importance of modeling the whole system—not just its symptoms.

Fixing the “indicator shows while words are still printing” Bug

The single definition below resolves the original problem where the stall indicator lit up even while buffered words were still being printed.

Note: If the UI is still draining buffered words, you’re not stalled – you’re busy.

3) Use “last receive time” instead of “last render time”

After fixing the first issue we encountered a second, subtler bug:

When streaming is actually paused, the indicator blinks instead of staying lit.

This is a classic mistake in real‑time UI code: updating the progress timestamp during a redraw makes the indicator self‑canceling.

The solution (GPT‑5.2)

Separate the concepts:

# Only updated when new stream content arrives
self._last_receive_time: float = 0.0

Update it only in update() when the content truly changes:

with self._lock:
    if content == self._current_content:
        return
    self._last_receive_time = time.time()

The monitor then checks:

time_since_receive = time.time() - self._last_receive_time
should_show_stall = time_since_receive >= self._stall_threshold

Result: the indicator becomes “sticky” in the correct way:

it turns on after the threshold,
it stays on continuously,
it turns off immediately when new text arrives.

The Final Monitor Loop (the boring version that works)

def _monitor_stall(self) -> None:
    while not self._stop_monitoring.is_set():
        if self._stop_monitoring.wait(0.5):
            break

        with self._lock:
            if not self._started or not self._animated_content:
                continue

            caught_up = (not self._is_animating) and (
                self._animated_content == self._current_content
            )
            if not caught_up:
                continue

            time_since_receive = time.time() - self._last_receive_time
            should_show_stall = time_since_receive >= self._stall_threshold

            if should_show_stall != self._showing_stall_indicator:
                self._live.update(
                    _create_response_panel(
                        self._animated_content,
                        use_markdown=False,
                        show_stall_indicator=should_show_stall,
                    )
                )
                self._showing_stall_indicator = should_show_stall

Key properties

No indicator while buffered words are still animating.
Indicator appears only after no new content arrives for stall_threshold.
Indicator stays on continuously once shown.
Indicator disappears immediately when new text arrives.

The spinner stops being a “feature” and becomes infrastructure – exactly what terminal UX should be.

The Real Theme: Why Swapping Models Is a Debugging Tool

I’m not interested in “model wars,” but I am interested in the practical reality of building with them:

Model	Strength
Opus 4.5	Quickly produces a plausible implementation and cleans up structure when asked, but tends to circle around incremental fixes.
GPT‑5.2	Steps back, sees the “two writers + ambiguous stall definition” problem, and forces a solution into a small state machine with serialized rendering.

This doesn’t mean one model is “better” in the abstract; it means it’s more useful for a particular debugging situation. When a model loops, change the conversation shape—or change the model. In Aye Chat, switching models is cheap, and “cheap” matters when you’re stuck on a UI race condition that reproduces only 1 in 10 times.

Takeaways (and how they match Aye Chat’s philosophy)

A spinner has a bigger correctness surface area than it deserves.
Animation + monitoring + concurrent rendering is a real system.
“Stall” is a state, not a timeout.
It must mean “caught up and no new input,” not “some time passed.”
Don’t let rendering update the clock that decides whether to render.
That’s how you invent blinking.
Locking isn’t optional when multiple threads can render.
Even if nothing crashes, the UX will suffer.
Model choice is part of the toolchain.
When one model gets stuck in local fixes, another might see the global shape.

In a weird way, this tiny ⋯ waiting for more indicator teaches the same lesson as the optimistic workflow:

Let the system move fast,
Build it so you can recover instantly,
Be pragmatic about the tools (including the model) that get you unstuck.

About Aye Chat

Aye Chat is an open‑source, AI‑powered terminal workspace that brings AI directly into command‑line workflows. Edit files, run commands, and chat with your codebase without leaving the terminal.

Support Us

⭐ Star our GitHub repository
Spread the word. Share Aye Chat with your team and friends who live in the terminal.

When a Model Gets Stuck: How GPT‑5.2 Finished a 'Simple' Spinner That Opus 4.5 Couldn't

Feature Request: “This should be simple”

The Scene: Simulated Streaming Meets Real‑World Stalls

Architecture (and Why It Can Lie to You)

What “Stalled” Actually Means

Where Opus 4.5 Got Stuck: Fixing Symptoms Instead of the Machine

What GPT‑5.2 Did Differently: Treat It Like a State Machine with a Single Renderer

1️⃣ Serialize Shared State and All UI Updates

2️⃣ Redefine “Stall” as “Caught Up and No New Input”

3️⃣ Keep the Indicator Purely UI‑Side

Result: A Boring‑But‑Correct Spinner

Fixing the “indicator shows while words are still printing” Bug

3) Use “last receive time” instead of “last render time”

The solution (GPT‑5.2)

The Final Monitor Loop (the boring version that works)

Key properties

The Real Theme: Why Swapping Models Is a Debugging Tool

Takeaways (and how they match Aye Chat’s philosophy)

About Aye Chat

Support Us

Related posts

Rapg: TUI-based Secret Manager

Quick Data Recovery using Snapshots - Amazon FSx for NetApp ONTAP

Technology is an Enabler, not a Saviour

Industry Survey: Faster Coding, Slower Debugging

Feature Request: “This should be simple”

The Scene: Simulated Streaming Meets Real‑World Stalls

Architecture (and Why It Can Lie to You)

What “Stalled” Actually Means

Where Opus 4.5 Got Stuck: Fixing Symptoms Instead of the Machine

What GPT‑5.2 Did Differently: Treat It Like a State Machine with a Single Renderer

1️⃣ Serialize Shared State and All UI Updates

2️⃣ Redefine “Stall” as “Caught Up and No New Input”

3️⃣ Keep the Indicator Purely UI‑Side

Result: A Boring‑But‑Correct Spinner

Fixing the “indicator shows while words are still printing” Bug

3) Use “last receive time” instead of “last render time”

The solution (GPT‑5.2)

The Final Monitor Loop (the boring version that works)

Key properties

The Real Theme: Why Swapping Models Is a Debugging Tool

Takeaways (and how they match Aye Chat’s philosophy)

About Aye Chat

Support Us

Related posts

Rapg: TUI-based Secret Manager

Quick Data Recovery using Snapshots - Amazon FSx for NetApp ONTAP

Technology is an Enabler, not a Saviour

Industry Survey: Faster Coding, Slower Debugging

Where Opus 4.5 Got Stuck: Fixing Symptoms Instead of the Machine

Takeaways (and how they match Aye Chat’s philosophy)

About Aye Chat