Why Your Browser Benchmark is Lying to You About AI Performance

Published: 3 months ago (February 3, 2026 at 02:58 AM EST)

5 min read

Source: Dev.to

Source: Dev.to

The Shift from “Document Web” to “Compute Web”

For years, we’ve measured web performance through the lens of latency:

How fast does this script load?
How quickly can the engine execute this single loop?

However, the Document Web is no longer active. We are now living in the Compute Web era—where browsers are expected to:

Run local AI inference
Process massive data streams
Handle complex UI states simultaneously

Traditional benchmarks are like testing a 16‑cylinder engine by checking the speed of a single piston. They don’t tell you how the engine performs under actual load.

Why Existing Benchmarks Fall Short

Most benchmarks (e.g., JetStream, Speedometer) focus on sequential execution. While they’re great for measuring JavaScript engine maturity and overall browser performance, they fail to account for Task Saturation.

Peak performance in a modern AI‑driven web app is all about how efficiently the browser can orchestrate concurrent, resource‑intensive tasks, such as:

Category	Example
CPU‑Intensive Work	Pre‑processing large datasets or 50 MB JSON payloads in a Web Worker
GPU‑Intensive Work	Running a local AI inference model using WebGPU
Main‑Thread Work	Keeping the UI responsive at 60 fps

The true bottleneck is often the handoff and scheduling between these components, not the raw speed of any single one. Focusing on an isolated variable (e.g., raw GPU speed) fails to capture the Ultimate Performance of an application under real‑world conditions.

Introducing SpeedPower.run

At ScaleDynamics, we observed that the handoff between CPU and GPU is the primary bottleneck in AI‑driven web apps. Frustrated by synthetic, disconnected benchmarks, we built SpeedPower.run—the definitive benchmark for real‑world compute performance on the modern web.

Simultaneous Load Methodology

SpeedPower.run determines your browser and device’s maximum performance by pushing all CPUs and GPUs to their limits simultaneously.

Concurrent tasks: AI inferences run while heavy JavaScript processing occurs.
Technologies used: JavaScript, WASM, WebGL, WebGPU.

Ensuring a Fair Score (Methodology & Integrity)

Zero Network Interference – The timer starts only after all large assets (≈ 400 MB of AI models) are loaded into memory.
Warm‑up Execution Phase – Allows the browser to finish internal optimizations (e.g., code compilation) before the final score is recorded.
Score Stability – Statistical regression analysis on peak metrics smooths out system‑level scheduling noise, producing a dependable result rather than a single‑moment snapshot.
User Guidance – Run the test multiple times to capture the highest possible score, as OS‑level factors can affect performance.

Core Benchmarks

1. JavaScript

Measures raw computational power for pre/post‑processing on JS objects and JSON.

Uses four tests from the Apple/WebKit JetStream 2 suite:
- Access Binary Trees
- Control Flow Recursive
- Regexp DNA
- String Tag Cloud
Runs these benchmarks in parallel across multiple Web Workers to gauge maximum multi‑core CPU processing power.

2. AI with TensorFlow.js

AI Recognition (TFJS)

Model: BlazeFace
Input: 128 × 128 tensor, pre‑warmed graph
Measures: Forward‑pass speed + post‑processing (decoding highest‑confidence face detection) across backends (JS, WASM, WebGL, WebGPU).

AI Classification (TFJS)

Model: MobileNetV3‑Small
Input: 224 × 224 tensor, pre‑warmed graph
Measures: Same as above, focusing on classification throughput.

3. AI with Transformers.js (v3)

AI Classification (Transformers)

Model: MobileNetV4‑Small
Backend: Prioritizes high‑performance WebGPU (fallback to WebGL)
Input: Fixed 224 × 224 tensor
Measures: Parallel inference capacity using asynchronous command queues and compute shaders.

AI LLM (Transformers)

Model: SmolLM2‑135M‑Instruct (causal language model)
Format: 4‑bit quantized (q4) ONNX
Measures: GPU runtime efficiency isolated from model‑loading overhead; multi‑threaded LLM execution and real‑time autoregressive decoding.

AI Speech (Transformers)

Model: Moonshine‑Tiny (automatic speech recognition)
Precision: Hybrid (FP32)
Measures: Throughput of speech‑to‑text inference under simultaneous load.

Summary

SpeedPower.run is built to reflect the real‑world compute demands of modern web applications. By stressing CPU, GPU, and main‑thread work concurrently, it provides a holistic view of a browser’s ability to handle the Compute Web—the era where performance is defined not by isolated latency numbers, but by how well the system orchestrates many heavy tasks at once.

Encoder + Q4 Decoder

Isolates GPU runtime efficiency from audio‑processing overhead.
The score highlights the capacity for complex, high‑concurrency speech‑to‑text pipelines.

Exchange

Since modern apps rely on Web Workers, the “Exchange” benchmark measures the communication bottleneck between the main thread and workers. It tests the transfer speed of:

IPC
Transferables
Arrays
Buffers
Objects
OffScreen Canvas

Higher scores = more efficient main‑thread ↔ background‑worker communication.

Architecture: No Installation Required

We were adamant that this should require zero installation or setup. By leveraging WebAssembly (WASM) and WebGPU, we can access the bare metal of your device directly through the browser.

No need to download a 5 GB suite to see if your rig is ready for the AI web.
Click once, and in ~30 seconds we saturate every available thread to find your browser’s breaking point for modern, complex applications.

We are currently collecting data across thousands of hardware/browser combinations to refine our scoring for the “Ultimate Performance” of the modern web.

We’ve seen some fascinating anomalies already, like high‑end mobile ARM chips showing better task‑switching efficiency than some mid‑range x86 desktops due to better thermal‑aware scheduling in the browser.

🔗 speedpower.run

Does the result match your “real‑world” multitasking experience? Drop your score and your hardware specs in the comments. Let’s talk about the future of compute‑heavy web applications.

Why Your Browser Benchmark is Lying to You About AI Performance

The Shift from “Document Web” to “Compute Web”

Why Existing Benchmarks Fall Short

Introducing SpeedPower.run

Simultaneous Load Methodology

Ensuring a Fair Score (Methodology & Integrity)

Core Benchmarks

1. JavaScript

2. AI with TensorFlow.js

AI Recognition (TFJS)

AI Classification (TFJS)

3. AI with Transformers.js (v3)

AI Classification (Transformers)

AI LLM (Transformers)

AI Speech (Transformers)

Summary

Encoder + Q4 Decoder

Exchange

Architecture: No Installation Required

Related posts

Your AI Agent Just Got a Credit Card: Introducing x402 Bazaar

Smartfind.ai

Building a Jedi-Style Hand Gesture Interface with TensorFlow.js: Control Your Browser Without Touching Anything

How to Sync AI Skills Across Claude Code, OpenClaw, and Codex in 2 Minutes

The Shift from “Document Web” to “Compute Web”

Why Existing Benchmarks Fall Short

Introducing SpeedPower.run

Simultaneous Load Methodology

Ensuring a Fair Score (Methodology & Integrity)

Core Benchmarks

1. JavaScript

2. AI with TensorFlow.js

AI Recognition (TFJS)

AI Classification (TFJS)

3. AI with Transformers.js (v3)

AI Classification (Transformers)

AI LLM (Transformers)

AI Speech (Transformers)

Summary

Encoder + Q4 Decoder

Exchange

Architecture: No Installation Required

Related posts

Your AI Agent Just Got a Credit Card: Introducing x402 Bazaar

Smartfind.ai

Building a Jedi-Style Hand Gesture Interface with TensorFlow.js: Control Your Browser Without Touching Anything

How to Sync AI Skills Across Claude Code, OpenClaw, and Codex in 2 Minutes

Encoder + Q4 Decoder