Why Your Browser Benchmark is Lying to You About AI Performance
Source: Dev.to
The Shift from “Document Web” to “Compute Web”
For years, we’ve measured web performance through the lens of latency:
- How fast does this script load?
- How quickly can the engine execute this single loop?
However, the Document Web is no longer active. We are now living in the Compute Web era—where browsers are expected to:
- Run local AI inference
- Process massive data streams
- Handle complex UI states simultaneously
Traditional benchmarks are like testing a 16‑cylinder engine by checking the speed of a single piston. They don’t tell you how the engine performs under actual load.
Why Existing Benchmarks Fall Short
Most benchmarks (e.g., JetStream, Speedometer) focus on sequential execution. While they’re great for measuring JavaScript engine maturity and overall browser performance, they fail to account for Task Saturation.
Peak performance in a modern AI‑driven web app is all about how efficiently the browser can orchestrate concurrent, resource‑intensive tasks, such as:
| Category | Example |
|---|---|
| CPU‑Intensive Work | Pre‑processing large datasets or 50 MB JSON payloads in a Web Worker |
| GPU‑Intensive Work | Running a local AI inference model using WebGPU |
| Main‑Thread Work | Keeping the UI responsive at 60 fps |
The true bottleneck is often the handoff and scheduling between these components, not the raw speed of any single one. Focusing on an isolated variable (e.g., raw GPU speed) fails to capture the Ultimate Performance of an application under real‑world conditions.
Introducing SpeedPower.run
At ScaleDynamics, we observed that the handoff between CPU and GPU is the primary bottleneck in AI‑driven web apps. Frustrated by synthetic, disconnected benchmarks, we built SpeedPower.run—the definitive benchmark for real‑world compute performance on the modern web.
Simultaneous Load Methodology
SpeedPower.run determines your browser and device’s maximum performance by pushing all CPUs and GPUs to their limits simultaneously.
- Concurrent tasks: AI inferences run while heavy JavaScript processing occurs.
- Technologies used: JavaScript, WASM, WebGL, WebGPU.
Ensuring a Fair Score (Methodology & Integrity)
- Zero Network Interference – The timer starts only after all large assets (≈ 400 MB of AI models) are loaded into memory.
- Warm‑up Execution Phase – Allows the browser to finish internal optimizations (e.g., code compilation) before the final score is recorded.
- Score Stability – Statistical regression analysis on peak metrics smooths out system‑level scheduling noise, producing a dependable result rather than a single‑moment snapshot.
- User Guidance – Run the test multiple times to capture the highest possible score, as OS‑level factors can affect performance.
Core Benchmarks
1. JavaScript
Measures raw computational power for pre/post‑processing on JS objects and JSON.
- Uses four tests from the Apple/WebKit JetStream 2 suite:
- Access Binary Trees
- Control Flow Recursive
- Regexp DNA
- String Tag Cloud
- Runs these benchmarks in parallel across multiple Web Workers to gauge maximum multi‑core CPU processing power.
2. AI with TensorFlow.js
AI Recognition (TFJS)
- Model: BlazeFace
- Input: 128 × 128 tensor, pre‑warmed graph
- Measures: Forward‑pass speed + post‑processing (decoding highest‑confidence face detection) across backends (JS, WASM, WebGL, WebGPU).
AI Classification (TFJS)
- Model: MobileNetV3‑Small
- Input: 224 × 224 tensor, pre‑warmed graph
- Measures: Same as above, focusing on classification throughput.
3. AI with Transformers.js (v3)
AI Classification (Transformers)
- Model: MobileNetV4‑Small
- Backend: Prioritizes high‑performance WebGPU (fallback to WebGL)
- Input: Fixed 224 × 224 tensor
- Measures: Parallel inference capacity using asynchronous command queues and compute shaders.
AI LLM (Transformers)
- Model: SmolLM2‑135M‑Instruct (causal language model)
- Format: 4‑bit quantized (q4) ONNX
- Measures: GPU runtime efficiency isolated from model‑loading overhead; multi‑threaded LLM execution and real‑time autoregressive decoding.
AI Speech (Transformers)
- Model: Moonshine‑Tiny (automatic speech recognition)
- Precision: Hybrid (FP32)
- Measures: Throughput of speech‑to‑text inference under simultaneous load.
Summary
SpeedPower.run is built to reflect the real‑world compute demands of modern web applications. By stressing CPU, GPU, and main‑thread work concurrently, it provides a holistic view of a browser’s ability to handle the Compute Web—the era where performance is defined not by isolated latency numbers, but by how well the system orchestrates many heavy tasks at once.
Encoder + Q4 Decoder
Isolates GPU runtime efficiency from audio‑processing overhead.
The score highlights the capacity for complex, high‑concurrency speech‑to‑text pipelines.
Exchange
Since modern apps rely on Web Workers, the “Exchange” benchmark measures the communication bottleneck between the main thread and workers. It tests the transfer speed of:
- IPC
- Transferables
- Arrays
- Buffers
- Objects
- OffScreen Canvas
Higher scores = more efficient main‑thread ↔ background‑worker communication.
Architecture: No Installation Required
We were adamant that this should require zero installation or setup. By leveraging WebAssembly (WASM) and WebGPU, we can access the bare metal of your device directly through the browser.
- No need to download a 5 GB suite to see if your rig is ready for the AI web.
- Click once, and in ~30 seconds we saturate every available thread to find your browser’s breaking point for modern, complex applications.
We are currently collecting data across thousands of hardware/browser combinations to refine our scoring for the “Ultimate Performance” of the modern web.
We’ve seen some fascinating anomalies already, like high‑end mobile ARM chips showing better task‑switching efficiency than some mid‑range x86 desktops due to better thermal‑aware scheduling in the browser.
Does the result match your “real‑world” multitasking experience? Drop your score and your hardware specs in the comments. Let’s talk about the future of compute‑heavy web applications.