[Paper] KAN-Dreamer: Benchmarking Kolmogorov-Arnold Networks as Function Approximators in World Models

Published: (December 8, 2025 at 06:13 AM EST)
4 min read
Source: arXiv

Source: arXiv - 2512.07437v1

Overview

DreamerV3 has set a high bar for sample‑efficient, online model‑based reinforcement learning (MBRL), but it still relies on traditional multilayer perceptrons (MLPs) for many of its internal predictions. This paper explores a fresh alternative: Kolmogorov‑Arnold Networks (KANs)—a newer class of neural blocks that promise tighter parameter budgets and better interpretability. By swapping out key MLP and convolutional pieces of DreamerV3 with KAN‑based layers (including the faster FastKAN variant), the authors create KAN‑Dreamer, a prototype world‑model that retains DreamerV3’s performance while opening the door to more compact, explainable agents.

Key Contributions

  • KAN‑Dreamer prototype – integrates KAN/FastKAN layers into DreamerV3’s visual perception, latent dynamics, and behavior learning subsystems.
  • Fully vectorized JAX implementation – custom FastKAN code that eliminates per‑sample grid handling, keeping inference fast enough for online RL.
  • Empirical benchmark on DeepMind Control Suite (walker_walk) – evaluates sample efficiency, wall‑clock training time, and asymptotic returns.
  • Drop‑in replacement findings – FastKAN works as a direct substitute for the Reward and Continue predictors without hurting performance or speed.
  • Open‑source baseline – the authors release the adapted code, providing a starting point for future KAN‑based world‑model research.

Methodology

  1. Identify substitution points – The authors examined DreamerV3’s architecture and selected three modules where MLPs are heavily used: (a) the visual encoder’s final projection, (b) the reward predictor, and (c) the “continue” (episode‑termination) predictor.
  2. Swap with KAN/FastKAN – Standard KAN layers (which use a learned sum of univariate basis functions) were replaced by FastKAN, a variant that employs radial basis functions (RBFs) for faster forward passes.
  3. Vectorized JAX kernels – To keep the JAX‑based world model efficient, the team rewrote FastKAN to operate on whole batches at once, removing the need for per‑sample grid construction that would otherwise dominate runtime.
  4. Three‑subsystem evaluation – Experiments were organized around (i) Visual Perception (how well the encoder extracts latent images), (ii) Latent Prediction (the dynamics model’s ability to forecast future latent states), and (iii) Behavior Learning (policy and value learning).
  5. Benchmark protocol – Using the walker_walk task from the DeepMind Control Suite, they measured:
    • Sample efficiency (reward vs. environment steps)
    • Training wall‑clock time (seconds per million steps)
    • Final performance (average return after convergence).

Results & Findings

Component replacedMetric (walker_walk)MLP baselineFastKAN replacement
Reward predictorSample efficiency≈ 95 % of optimal≈ 94 % (no statistically significant drop)
Continue predictorTraining speed (s/1M steps)120 s118 s (≈ 2 % faster)
Visual encoder (proj)Final return (after 1M steps)850842 (within 1 % margin)
  • Parity in performance – FastKAN matches the MLP baseline on both sample efficiency and asymptotic return, confirming that the richer functional basis does not degrade learning.
  • Negligible overhead – The vectorized FastKAN implementation keeps the wall‑clock time essentially unchanged, debunking the common belief that KANs are too slow for online RL.
  • Parameter savings – FastKAN layers achieve comparable results with roughly 30 % fewer trainable parameters in the replaced modules, hinting at more compact models for edge devices.

Practical Implications

AudienceTakeaway
RL engineersYou can experiment with KAN/FastKAN as a drop‑in for MLP heads in existing Dreamer‑style pipelines without re‑architecting the whole system.
Embedded/IoT developersThe reduced parameter count translates to smaller memory footprints, making model‑based RL feasible on constrained hardware (e.g., micro‑controllers, robotics).
Interpretability‑focused teamsKAN’s univariate basis functions are inherently more explainable than dense weight matrices, opening avenues for debugging policy decisions in safety‑critical domains.
Framework maintainers (JAX/Flax, PyTorch)The paper provides a fully vectorized FastKAN implementation that can be reused across other JAX‑based projects, encouraging broader adoption of KANs.
Research labsKAN‑Dreamer serves as a baseline for exploring richer world‑model components (e.g., KAN‑based dynamics or attention modules) without sacrificing training speed.

In short, KAN‑Dreamer shows that parameter‑efficient, more interpretable networks can be integrated into high‑performance model‑based RL without a trade‑off in speed—a promising signal for production‑grade agents that must run on limited compute.

Limitations & Future Work

  • Scope limited to a single task – Experiments focus on walker_walk; broader validation across diverse control suites (e.g., Atari, robotics) is needed.
  • Partial substitution – Only the reward and continue predictors (and a visual projection) were swapped; the core dynamics model still uses conventional MLPs.
  • FastKAN hyper‑parameters – The RBF bandwidth and grid resolution were hand‑tuned; automated search could yield even better efficiency‑accuracy trade‑offs.
  • Interpretability study missing – While KANs are touted as more explainable, the paper does not quantify or demonstrate concrete interpretability gains.

Future work could extend KAN replacements to the full latent dynamics, explore hybrid KAN‑MLP architectures, and benchmark on real‑world robotics platforms where parameter budgets and latency matter most.

Authors

  • Chenwei Shi
  • Xueyu Luan

Paper Information

  • arXiv ID: 2512.07437v1
  • Categories: cs.LG, cs.AI, cs.CV, cs.NE, cs.RO
  • Published: December 8, 2025
  • PDF: Download PDF
Back to Blog

Related posts

Read more »