[Paper] Free-RBF-KAN: Kolmogorov-Arnold Networks with Adaptive Radial Basis Functions for Efficient Function Learning
Source: arXiv - 2601.07760v1
Overview
The paper introduces Free‑RBF‑KAN, a new variant of Kolmogorov‑Arnold Networks (KANs) that swaps the traditional B‑spline basis for adaptive radial basis functions (RBFs). By letting the RBF centers, widths, and smoothness parameters be learned directly from data, the authors achieve the same approximation power as classic KANs while cutting training and inference time—an attractive proposition for anyone building high‑performance, low‑latency ML models.
Key Contributions
- Adaptive RBF Grid: Unlike fixed RBF placements, the network learns a “free” grid of RBF centers and scales, aligning the basis with the data’s activation patterns.
- Trainable Smoothness Parameter: Smoothness is treated as a kernel hyper‑parameter and optimized jointly with weights, removing the need for manual tuning.
- Universality Proof for RBF‑KANs: The authors extend the theoretical foundation of KANs, showing that any continuous multivariate function can be approximated arbitrarily well with the proposed RBF formulation.
- Efficiency Gains: Empirical benchmarks demonstrate faster forward/backward passes compared with B‑spline‑based KANs, without extra memory overhead.
- Broad Experimental Validation: Experiments span multiscale function fitting, physics‑informed neural networks (PINNs), and learning solution operators for PDEs, confirming both accuracy and speed benefits.
Methodology
-
Network Architecture – A KAN decomposes a multivariate function into a sum of univariate “inner” functions followed by a multivariate “outer” function. Free‑RBF‑KAN replaces each inner univariate function with a weighted sum of Gaussian RBFs:
[ f_i(x) = \sum_{k=1}^{K} w_{ik},\phi\bigl(\alpha_{ik}(x - c_{ik})\bigr) ]
where (c_{ik}) (center), (\alpha_{ik}) (inverse width), and a global smoothness scalar (\beta) are all learnable.
-
Adaptive Grid Learning – During back‑propagation, gradients flow not only to the linear weights (w_{ik}) but also to the centers (c_{ik}) and scales (\alpha_{ik}). This lets the basis “morph” to match the data distribution, effectively providing a data‑driven resolution grid.
-
Smoothness as a Kernel Parameter – The Gaussian kernel is modified to (\phi_{\beta}(z)=\exp(-\beta z^2)). The scalar (\beta) is optimized jointly, allowing the network to automatically trade off smoothness vs. sharpness.
-
Training Pipeline – The authors use standard stochastic gradient descent (Adam) with weight decay. No special regularizers are required; the adaptive parameters are naturally constrained by the loss gradient.
-
Theoretical Guarantee – By constructing a dense set of RBFs and leveraging the Kolmogorov‑Arnold representation theorem, they prove that Free‑RBF‑KAN can approximate any continuous function on a compact domain to arbitrary precision.
Results & Findings
| Task | Metric (lower is better) | B‑spline KAN | Free‑RBF‑KAN | Speedup (train / infer) |
|---|---|---|---|---|
| Multiscale 1‑D function | MSE | 1.2e‑4 | 1.1e‑4 | 1.8× / 2.1× |
| PINN for Burgers’ equation | Relative L2 error | 3.5e‑3 | 3.3e‑3 | 1.6× / 1.9× |
| PDE operator (Navier‑Stokes) | MAE | 4.8e‑3 | 4.7e‑3 | 1.5× / 1.7× |
- Accuracy: Free‑RBF‑KAN matches or slightly improves the original KAN across all benchmarks, confirming that adaptive RBFs close the performance gap observed in earlier RBF‑KAN attempts.
- Efficiency: By eliminating the costly De Boor recursion required for B‑splines, the new model reduces both FLOPs and memory traffic, yielding roughly 1.5–2× faster training and inference.
- Scalability: Experiments up to 64‑dimensional input spaces show stable convergence, indicating that the adaptive grid does not explode combinatorially.
Practical Implications
- Faster Prototyping – Developers can swap a B‑spline KAN for Free‑RBF‑KAN with a drop‑in code change and immediately see speed gains, especially valuable in edge‑device or real‑time inference scenarios.
- Adaptive Resolution for Scientific ML – In physics‑informed models where solution features (e.g., shocks) are localized, the learnable RBF grid automatically concentrates basis functions where they’re needed, reducing the manual engineering of mesh refinements.
- Low‑Memory Deployments – Since RBFs are parameter‑efficient (no knot vectors), model size stays comparable to classic KANs, making the approach suitable for mobile or embedded AI stacks.
- Plug‑and‑Play with Existing Frameworks – The authors provide a PyTorch implementation that integrates with standard
nn.Modulepipelines, meaning existing training loops, optimizers, and mixed‑precision utilities work out‑of‑the‑box.
Limitations & Future Work
- Hyper‑parameter Sensitivity – While smoothness is learned, the initial number of RBFs per inner function still needs to be chosen; too few can limit expressivity, too many can increase training time.
- Gradient Stability – Learning centers and widths jointly can lead to occasional “collapse” where multiple RBFs converge to the same location; the authors mitigate this with small learning‑rate schedules but a more robust regularizer could help.
- Extension to Non‑Gaussian Kernels – The paper focuses on Gaussian RBFs; exploring other kernels (e.g., Matérn, compact‑support) could further improve performance on specific domains.
- Theoretical Tightness – The universality proof guarantees approximation in the limit; tighter bounds on required RBF count for a given error tolerance remain an open question.
Bottom line: Free‑RBF‑KAN offers a practical, high‑performance alternative to classic KANs, delivering the same expressive power with a leaner computational footprint—an appealing tool for developers building next‑generation function‑approximation models, from scientific simulators to real‑time AI services.
Authors
- Shao‑Ting Chiu
- Siu Wun Cheung
- Ulisses Braga‑Neto
- Chak Shing Lee
- Rui Peng Li
Paper Information
- arXiv ID: 2601.07760v1
- Categories: cs.LG, math.NA
- Published: January 12, 2026
- PDF: Download PDF