[Paper] Necessary and sufficient conditions for universality of Kolmogorov-Arnold networks

Published: 2 days ago (April 26, 2026 at 11:31 AM EDT)

5 min read

Source: arXiv

Source: arXiv - 2604.23765v1

Overview

The paper investigates when Kolmogorov‑Arnold Networks (KANs) can approximate any continuous function on a compact domain. By focusing on the edge functions (the scalar functions applied along each network edge), the author pinpoints the minimal non‑linear ingredient needed for universal approximation. The results tighten the theoretical guarantees for KANs and give developers concrete guidance on how to build expressive, yet lightweight, KAN‑based models.

Key Contributions

Single non‑affine edge suffices for deep KANs – any deep KAN whose edges are either affine or a fixed continuous non‑affine function σ is dense in C(K).
Two‑layer KANs need a non‑polynomial σ – for networks with exactly two hidden layers, universality holds iff σ is not a polynomial.
Finite affine set replaces the whole affine family – a small, fixed collection of affine functions (as few as five) together with σ already guarantees universality, regardless of depth.
Constructive affine families – for every non‑affine σ there exists a concrete finite set A₍σ₎ of affine maps such that KANs using only functions from A₍σ₎ ∪ {σ} remain universal.
Spline‑based edge parameterization is universal – the recent spline‑edge formulation (Liu et al., 2024) retains universal approximation even when spline degree and knot locations are fixed beforehand.

Methodology

Theoretical framework – The analysis builds on the classical Kolmogorov superposition theorem, which states that any multivariate continuous function can be expressed as a finite sum of univariate continuous functions composed with linear forms.
Edge‑function classification – Edge functions are split into two categories: affine (linear + bias) and a single “special” continuous function σ.
Constructive approximation proofs
- For deep networks, the author shows how to embed the Kolmogorov representation using only σ for the non‑linear part, while the affine edges handle the required linear combinations.
- For two‑layer networks, a polynomial σ fails to generate the needed richness; a non‑polynomial σ can reproduce the required basis functions.
Finite affine families – By exploiting linear independence of a small set of affine maps, the paper proves that any additional affine function can be expressed as a linear combination of a fixed finite basis, preserving universality.
Spline edge analysis – The spline parameterization is treated as a specific σ (a piecewise‑polynomial). The author shows that, even with a predetermined knot sequence and degree, the spline family satisfies the non‑polynomial condition, thus guaranteeing universality.

Results & Findings

Setting	Condition on σ	Universality?	Remarks
Deep KAN (≥ 3 hidden layers)	σ non‑affine (any continuous non‑linear shape)	Yes	One non‑affine edge is enough; all other edges may be purely affine.
Two‑layer KAN	σ non‑polynomial (e.g., ReLU, tanh, spline)	Yes	Polynomial σ (including quadratic) cannot achieve density.
Affine edge set	Replace “all affine” by a finite set (≥ 5)	Yes (with any suitable σ)	Reduces the design space without sacrificing expressive power.
Spline‑edge KAN	Fixed degree & knot sequence, σ = spline basis	Yes	Confirms Liu et al.’s empirical success on theoretical grounds.

In plain terms, the paper proves that adding just one well‑chosen non‑linear scalar function to the edge repertoire unlocks the full approximation power of KANs, and that the rest of the network can be built from a tiny, predetermined toolbox of linear maps.

Practical Implications

Simplified architecture design – When building a KAN, you don’t need a zoo of activation functions. Choose any continuous non‑affine σ (e.g., ReLU, sigmoid, a spline) and pair it with a handful of fixed affine transforms; the network will be universal.
Parameter‑budget friendly models – Because only a finite set of affine maps is required, you can pre‑compute or hard‑code these linear transformations, reducing runtime overhead and memory footprint.
Guidance for activation selection – For shallow (two‑layer) KANs, avoid polynomial activations (e.g., pure quadratic) and opt for non‑polynomial ones. This explains why ReLU‑based KANs work well in practice.
Confidence in spline‑based KANs – Developers using the spline edge parameterization can now rely on a solid universal approximation guarantee even if they fix the spline degree and knot positions, simplifying hyper‑parameter tuning.
Potential for hardware acceleration – Since the affine part can be limited to a small, known set, custom ASIC/FPGA implementations can cache these linear maps, leaving only the evaluation of a single σ to be accelerated (often already supported for ReLU or piecewise‑linear splines).

Limitations & Future Work

Constructive constants are not quantified – The proofs guarantee existence of approximations but do not provide explicit bounds on network depth or width needed for a given error tolerance.
Focus on continuous functions – The universality results apply to C(K) (continuous functions on compact sets). Extending the analysis to Lᵖ spaces or to functions with discontinuities (e.g., classification boundaries) remains open.
Empirical validation – While the theoretical claims are solid, systematic experiments comparing shallow vs. deep KANs with different σ choices would help translate the results into practical design rules.
Optimization landscape – The paper does not address how the restricted edge function set influences training dynamics, convergence speed, or susceptibility to local minima.
Extension to stochastic or adaptive σ – Investigating whether a learnable σ (e.g., a small neural sub‑module) can further reduce the required affine set or improve sample efficiency is a promising direction.

Bottom line for developers: If you’re experimenting with Kolmogorov‑Arnold Networks, pick any non‑affine activation (ReLU, tanh, a spline, etc.), sprinkle in a handful of fixed linear maps, and you have a theoretically guaranteed universal approximator—no need for elaborate activation libraries or massive hidden‑layer designs. This insight can streamline model prototyping, reduce memory usage, and open the door to efficient hardware implementations.

Authors

Vugar Ismailov

Paper Information

arXiv ID: 2604.23765v1
Categories: cs.LG, cs.NE, math.FA
Published: April 26, 2026
PDF: Download PDF

[Paper] Necessary and sufficient conditions for universality of Kolmogorov-Arnold networks

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Recursive Multi-Agent Systems

[Paper] How Fast Should a Model Commit to Supervision? Training Reasoning Models on the Tsallis Loss Continuum

[Paper] Teacher Forcing as Generalized Bayes: Optimization Geometry Mismatch in Switching Surrogates for Chaotic Dynamics

[Paper] Carbon-Taxed Transformers: A Green Compression Pipeline for Overgrown Language Models