Mamba-2 vs Griffin vs RWKV-6: SSM Architecture Benchmark

Published: 3 days ago (February 14, 2026 at 04:34 PM EST)

1 min read

Source: Dev.to

The quadratic complexity of attention — $O(n^2)$ for sequence length $n$ — stopped being theoretical the moment context windows hit 128 k tokens. State Space Models (SSMs) promise $O(n)$ complexity without sacrificing quality, but three architectures dominate 2026: Mamba‑2, Griffin, and RWKV‑6.

I benchmarked all three on the same 1.3 B‑parameter budget. The results challenged what I thought I knew about attention alternatives.

Close‑up of a Seagate FireCuda SSD on a white background with three yellow rubber ducks.
Photo by Andrey Matveev on Pexels

What Makes SSMs Different From Transformers

Transformers compute attention scores between every token pair. For a 10 k token sequence, that’s 100 M comparisons. SSMs instead maintain a fixed‑size hidden state that gets updated sequentially:

$$ h_t = \bar{A},h_{t-1} + \bar{B},x_t $$

$$ y_t = C,h_t $$

The matrices $\bar{A}, \bar{B}, C$ are learned, but crucially $h_t$ doesn’t grow with sequence length. You process 10 tokens or 100 k tokens with the same memory footprint.

Continue reading the full article on TildAlice

Mamba-2 vs Griffin vs RWKV-6: SSM Architecture Benchmark

What Makes SSMs Different From Transformers

Related posts

Structured AI (YC F25) Is Hiring

Iron Triangles: Powerful Tools for Analyzing Trade-Offs in AI Product Development

Claude Sonnet 4.6 model brings ‘much-improved coding skills’ and upgraded free tier

From DAN to AutoDAN-Turbo: The Wild Evolution of AI Jailbreaking 🚀