Chasing 240 FPS in LLM Chat UIs

Published: (December 15, 2025 at 12:52 PM EST)
4 min read
Source: Dev.to

Source: Dev.to

TL;DR

I built a benchmark suite to test various optimizations for streaming LLM responses in a React UI. Key takeaways:

  1. Build a proper state first, then optimize rendering later. Ideally, keep state outside React (e.g., with Zustand) and adapt it later.
  2. Don’t focus on React re‑renders or hooks; prioritize windowing. Gains from memoization, useTransition, useDeferredValue, etc., are minimal compared to windowing techniques.
  3. Optimize the Critical Rendering Path (CRP) with CSS properties like content-visibility: auto and contain: content. If you use animations, add will-change.
  4. Avoid returning raw markdown from the LLM when possible. Parsing and rendering markdown is expensive. If you must, split the response into plain‑text and markdown segments and only parse the markdown parts.
  5. Throttle the stream if network speed isn’t a bottleneck. Adding a tiny delay (5–10 ms) between words can help maintain higher FPS.

How LLM Streaming Works

LLMs stream responses in chunks via Server‑Sent Events (SSE) rather than sending a complete answer at once.

// Basic LLM stream handling with fetch
async function streamLLMResponse(url) {
  const response = await fetch(url);
  const reader = response.body.getReader();
  let done = false;

  while (!done) {
    const { value, done: streamDone } = await reader.read();
    if (streamDone) {
      done = true;
      break;
    }
    const chunk = new TextDecoder().decode(value);
    // Process the chunk (can be one or multiple words)
    console.log(chunk);
  }
}

Handling the Chunk in React – Why It Lags

A naïve approach updates React state for every incoming chunk:

import { useState } from "react";

const [response, setResponse] = useState("");

async function streamLLMResponse(url) {
  const response = await fetch(url);
  const reader = response.body.getReader();
  let done = false;

  while (!done) {
    const { value, done: streamDone } = await reader.read();
    if (streamDone) {
      done = true;
      break;
    }
    const chunk = new TextDecoder().decode(value);
    setResponse(prev => prev + chunk); // Update state with each chunk
  }
}

As the chat history grows, each state update triggers a re‑render of the entire component tree, causing FPS to drop dramatically (often to 0 FPS within seconds).

Laggy React state updates

Optimizations

The benchmark suite includes:

  • A minimal Node + TypeScript server that streams words with a configurable delay (simulating an LLM stream).
  • A React + Vite + TypeScript frontend implementing various optimizations.
  • Performance measured in FPS during streaming; RAM usage varies per optimization.

RAF Batching

Buffer incoming chunks and flush them to state once per animation frame using requestAnimationFrame (RAF).

import { useRef, useState } from "react";

const [response, setResponse] = useState("");
const bufferRef = useRef("");

async function streamLLMResponse(url) {
  const response = await fetch(url);
  const reader = response.body.getReader();

  while (true) {
    const { value, done } = await reader.read();
    if (done) break;

    bufferRef.current += new TextDecoder().decode(value); // Collect chunk

    requestAnimationFrame(() => {
      if (bufferRef.current) {
        setResponse(prev => prev + bufferRef.current); // Flush once per frame
        bufferRef.current = "";
      }
    });
  }
}

Result: Min FPS ≈ 15 after 90 seconds—noticeable improvement but still not ideal.

RAF batching performance

React 18 startTransition

startTransition marks updates as non‑urgent, allowing React to prioritize urgent UI work (e.g., user input) over streaming text updates.

import { startTransition, useState } from "react";

function handleInputChange(e) {
  const value = e.target.value;
  setInputValue(value); // Urgent

  startTransition(() => {
    setFilteredList(filterList(value)); // Non‑urgent
  });
}

Caveat: If chunks arrive faster than React can yield, the main thread may still become saturated, limiting the benefit of startTransition.

Additional Tips

  • Windowing / Virtualization: Render only the visible portion of the chat history (e.g., using react-window or react-virtualized). This dramatically reduces DOM size and repaint cost.

  • CSS Optimizations:

    .chat-message {
      content-visibility: auto;
      contain: content;
    }

    Use will-change for animated elements to hint the browser about upcoming changes.

  • Chunk Throttling: Introducing a small artificial delay (5–10 ms) between processing chunks can smooth out rendering spikes without noticeably affecting perceived responsiveness.

  • State Management Outside React: Libraries like Zustand let you keep a mutable store that React reads from less frequently, reducing the number of state updates that trigger re‑renders.

By combining RAF batching, windowing, CRP‑focused CSS, and careful state management, you can approach a stable 60 FPS experience—and, on capable hardware, even push toward 240 FPS in LLM chat UIs.

Back to Blog

Related posts

Read more »