Chasing 240 FPS in LLM Chat UIs
Source: Dev.to
TL;DR
I built a benchmark suite to test various optimizations for streaming LLM responses in a React UI. Key takeaways:
- Build a proper state first, then optimize rendering later. Ideally, keep state outside React (e.g., with Zustand) and adapt it later.
- Don’t focus on React re‑renders or hooks; prioritize windowing. Gains from memoization,
useTransition,useDeferredValue, etc., are minimal compared to windowing techniques. - Optimize the Critical Rendering Path (CRP) with CSS properties like
content-visibility: autoandcontain: content. If you use animations, addwill-change. - Avoid returning raw markdown from the LLM when possible. Parsing and rendering markdown is expensive. If you must, split the response into plain‑text and markdown segments and only parse the markdown parts.
- Throttle the stream if network speed isn’t a bottleneck. Adding a tiny delay (5–10 ms) between words can help maintain higher FPS.
How LLM Streaming Works
LLMs stream responses in chunks via Server‑Sent Events (SSE) rather than sending a complete answer at once.
// Basic LLM stream handling with fetch
async function streamLLMResponse(url) {
const response = await fetch(url);
const reader = response.body.getReader();
let done = false;
while (!done) {
const { value, done: streamDone } = await reader.read();
if (streamDone) {
done = true;
break;
}
const chunk = new TextDecoder().decode(value);
// Process the chunk (can be one or multiple words)
console.log(chunk);
}
}
Handling the Chunk in React – Why It Lags
A naïve approach updates React state for every incoming chunk:
import { useState } from "react";
const [response, setResponse] = useState("");
async function streamLLMResponse(url) {
const response = await fetch(url);
const reader = response.body.getReader();
let done = false;
while (!done) {
const { value, done: streamDone } = await reader.read();
if (streamDone) {
done = true;
break;
}
const chunk = new TextDecoder().decode(value);
setResponse(prev => prev + chunk); // Update state with each chunk
}
}
As the chat history grows, each state update triggers a re‑render of the entire component tree, causing FPS to drop dramatically (often to 0 FPS within seconds).

Optimizations
The benchmark suite includes:
- A minimal Node + TypeScript server that streams words with a configurable delay (simulating an LLM stream).
- A React + Vite + TypeScript frontend implementing various optimizations.
- Performance measured in FPS during streaming; RAM usage varies per optimization.
RAF Batching
Buffer incoming chunks and flush them to state once per animation frame using requestAnimationFrame (RAF).
import { useRef, useState } from "react";
const [response, setResponse] = useState("");
const bufferRef = useRef("");
async function streamLLMResponse(url) {
const response = await fetch(url);
const reader = response.body.getReader();
while (true) {
const { value, done } = await reader.read();
if (done) break;
bufferRef.current += new TextDecoder().decode(value); // Collect chunk
requestAnimationFrame(() => {
if (bufferRef.current) {
setResponse(prev => prev + bufferRef.current); // Flush once per frame
bufferRef.current = "";
}
});
}
}
Result: Min FPS ≈ 15 after 90 seconds—noticeable improvement but still not ideal.

React 18 startTransition
startTransition marks updates as non‑urgent, allowing React to prioritize urgent UI work (e.g., user input) over streaming text updates.
import { startTransition, useState } from "react";
function handleInputChange(e) {
const value = e.target.value;
setInputValue(value); // Urgent
startTransition(() => {
setFilteredList(filterList(value)); // Non‑urgent
});
}
Caveat: If chunks arrive faster than React can yield, the main thread may still become saturated, limiting the benefit of startTransition.
Additional Tips
-
Windowing / Virtualization: Render only the visible portion of the chat history (e.g., using
react-windoworreact-virtualized). This dramatically reduces DOM size and repaint cost. -
CSS Optimizations:
.chat-message { content-visibility: auto; contain: content; }Use
will-changefor animated elements to hint the browser about upcoming changes. -
Chunk Throttling: Introducing a small artificial delay (5–10 ms) between processing chunks can smooth out rendering spikes without noticeably affecting perceived responsiveness.
-
State Management Outside React: Libraries like Zustand let you keep a mutable store that React reads from less frequently, reducing the number of state updates that trigger re‑renders.
By combining RAF batching, windowing, CRP‑focused CSS, and careful state management, you can approach a stable 60 FPS experience—and, on capable hardware, even push toward 240 FPS in LLM chat UIs.