Mamba-2 vs Griffin vs RWKV-6: SSM Architecture Benchmark

Published: (February 14, 2026 at 04:34 PM EST)
1 min read
Source: Dev.to

Source: Dev.to

The quadratic complexity of attention — $O(n^2)$ for sequence length $n$ — stopped being theoretical the moment context windows hit 128 k tokens. State Space Models (SSMs) promise $O(n)$ complexity without sacrificing quality, but three architectures dominate 2026: Mamba‑2, Griffin, and RWKV‑6.

I benchmarked all three on the same 1.3 B‑parameter budget. The results challenged what I thought I knew about attention alternatives.

Close‑up of a Seagate FireCuda SSD on a white background with three yellow rubber ducks.
Photo by Andrey Matveev on Pexels

What Makes SSMs Different From Transformers

Transformers compute attention scores between every token pair. For a 10 k token sequence, that’s 100 M comparisons. SSMs instead maintain a fixed‑size hidden state that gets updated sequentially:

$$ h_t = \bar{A},h_{t-1} + \bar{B},x_t $$

$$ y_t = C,h_t $$

The matrices $\bar{A}, \bar{B}, C$ are learned, but crucially $h_t$ doesn’t grow with sequence length. You process 10 tokens or 100 k tokens with the same memory footprint.

Continue reading the full article on TildAlice

0 views
Back to Blog

Related posts

Read more »

Structured AI (YC F25) Is Hiring

Overview Structured AI is building the AI workforce for construction design engineering. The Problem Today, billions of dollars and months of human effort are...