[Paper] SoftSAE: Dynamic Top-K Selection for Adaptive Sparse Autoencoders

Published: 3 days ago (May 7, 2026 at 01:28 PM EDT)

5 min read

Source: arXiv

Source: arXiv - 2605.06610v1

Overview

Sparse autoencoders (SAEs) have become a go‑to tool for turning the opaque activations of large language models and vision transformers into human‑readable concepts. The new SoftSAE paper shows that forcing every input to use the same fixed number of active latent units (the classic “Top‑K” approach) is sub‑optimal—real data varies in complexity, so the sparsity level should be adaptive. By introducing a differentiable “soft” Top‑K operator, SoftSAE learns an input‑dependent sparsity budget, automatically turning on more features for complex inputs and fewer for simple ones.

Key Contributions

Dynamic sparsity: Introduces a Soft Top‑K operator that lets the autoencoder decide, per‑sample, how many latent units to activate.
Differentiable selection: The soft operator is fully differentiable, enabling end‑to‑end training without resorting to reinforcement‑learning tricks or hard thresholds.
Improved interpretability: Demonstrates that the adaptive sparsity yields cleaner, more monosemantic features that align better with the intrinsic dimensionality of the data manifold.
Empirical validation: Shows on both language (LLM hidden states) and vision (ViT embeddings) benchmarks that SoftSAE matches or exceeds fixed‑K baselines while using fewer active units on average.
Open‑source implementation: Provides a ready‑to‑use PyTorch library, making it easy for practitioners to plug SoftSAE into existing interpretability pipelines.

Methodology

Encoder‑decoder backbone: Like standard SAEs, SoftSAE maps a high‑dimensional activation vector x → latent code z → reconstruction x̂.
Soft Top‑K layer: Instead of a hard arg‑max that selects the top K entries, SoftSAE computes a soft ranking using a temperature‑controlled softmax over the absolute latent values. This yields a continuous mask m(x) whose entries sum to an effective sparsity k(x) that the network learns.
Learned sparsity budget: A small auxiliary network predicts the appropriate temperature (or directly the target k) from the input, allowing the model to allocate more units when the input lies in a high‑dimensional region of the data manifold.
Loss function: Combines reconstruction error (MSE or cross‑entropy) with an ℓ₁ penalty on the masked latent code, encouraging overall sparsity while still permitting the dynamic budget to grow when needed.
Training: All components are differentiable, so standard stochastic gradient descent (Adam) suffices. No extra reinforcement‑learning or curriculum steps are required.

Results & Findings

Dataset / Model	Fixed‑K Top‑K SAE	SoftSAE (dynamic)
GPT‑2 hidden states (layer 12)	Avg. 0.87 bits reconstruction loss, 5.2 active units per token	0.81 bits loss, 3.8 → 7.1 active units (adapted per token)
ViT‑B/16 embeddings (ImageNet)	1.12 bits loss, 6 active units	0.98 bits loss, adaptive 4‑9 active units
Synthetic manifold (varying intrinsic dim.)	Over‑sparse on low‑dim points, under‑sparse on high‑dim points	Correctly matches local dimensionality, yielding lower KL divergence to ground‑truth sparsity distribution

Takeaway: SoftSAE consistently reduces reconstruction error while using fewer total activations on average, and more importantly, it allocates capacity where the data truly needs it. Qualitative inspection shows cleaner, more semantically isolated neurons (e.g., “color‑red” vs. “object‑car”) compared to the noisy mixes often seen in fixed‑K SAEs.

Practical Implications

Sharper model introspection: Developers building interpretability dashboards can rely on fewer, more meaningful concepts per token or image patch, making downstream analysis (e.g., concept probing, feature attribution) less noisy.
Resource‑efficient deployment: Because many inputs activate only a handful of latent units, downstream tasks that consume the SAE codes (e.g., clustering, retrieval) can be accelerated with sparse matrix operations.
Adaptive compression: In scenarios where you need to store or transmit latent representations (e.g., edge inference), SoftSAE’s variable‑length codes can reduce bandwidth without sacrificing fidelity.
Plug‑and‑play for existing pipelines: The open‑source PyTorch module can replace the standard Top‑K layer in any SAE‑based interpretability workflow with minimal code changes.
Potential for curriculum learning: The dynamic sparsity signal could be used to guide curriculum strategies—starting with simple inputs (few active units) and gradually exposing the model to richer representations.

Limitations & Future Work

Temperature tuning: Although the soft operator is learnable, the temperature hyper‑parameter still needs careful initialization; extreme values can lead to either near‑hard selection (losing differentiability) or overly diffuse masks.
Scalability to billions of neurons: The current implementation scales well to typical SAE sizes (≈10k latent units) but may hit memory bottlenecks when applied to ultra‑large latent spaces without additional sparsity‑aware kernels.
Evaluation on downstream tasks: The paper focuses on reconstruction and interpretability metrics; assessing how SoftSAE‑derived concepts improve downstream tasks (e.g., prompt engineering, bias detection) remains an open question.
Extension to multimodal models: Future work could explore joint dynamic sparsity across text‑image embeddings, where the optimal K may depend on cross‑modal interactions.

Bottom line: SoftSAE offers a practical, drop‑in upgrade for anyone using sparse autoencoders to dissect neural networks, delivering cleaner concepts and smarter resource usage by letting the data decide how many features it really needs.

Authors

Jakub Stępień
Marcin Mazur
Jacek Tabor
Przemysław Spurek

Paper Information

arXiv ID: 2605.06610v1
Categories: cs.LG, cs.CV
Published: May 7, 2026
PDF: Download PDF

[Paper] SoftSAE: Dynamic Top-K Selection for Adaptive Sparse Autoencoders

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Normalizing Trajectory Models

[Paper] EmambaIR: Efficient Visual State Space Model for Event-guided Image Reconstruction

[Paper] Flow-OPD: On-Policy Distillation for Flow Matching Models

[Paper] SCOPE: Structured Decomposition and Conditional Skill Orchestration for Complex Image Generation