[Paper] SUPN: Shallow Universal Polynomial Networks

Published: (November 26, 2025 at 09:06 AM EST)
4 min read
Source: arXiv

Source: arXiv - 2511.21414v1

Overview

The paper introduces Shallow Universal Polynomial Networks (SUPNs) – a new class of neural‑style models that replace the deep stack of hidden layers with a single layer of learnable multivariate polynomials, followed by a conventional output layer. By marrying the expressive power of deep nets with the compactness of polynomial approximations, SUPNs achieve comparable (or better) accuracy while using dramatically fewer trainable parameters, which translates into faster training, easier debugging, and more predictable generalization.

Key Contributions

  • SUPN Architecture: Proposes a shallow network where the hidden representation is a single polynomial layer with trainable coefficients, eliminating the need for many deep hidden layers.
  • Theoretical Guarantees: Shows that SUPNs converge at the same rate as the optimal polynomial approximation of the same degree and provides closed‑form, quasi‑optimal coefficient formulas.
  • Parameter Efficiency: Demonstrates analytically and empirically that SUPNs need far fewer parameters than deep neural networks (DNNs) or Kolmogorov‑Arnold networks (KANs) to reach a target error.
  • Extensive Empirical Study: Benchmarks >13 000 models across 1‑D, 2‑D, and 10‑D regression tasks, comparing SUPNs against DNNs, KANs, and pure polynomial projection.
  • Robustness to Non‑Smooth Functions: Finds that SUPNs can outperform standard polynomial projection even on functions with kinks or discontinuities, a regime where classic spectral methods usually struggle.

Methodology

  1. Polynomial Hidden Layer
    • Input vector x ∈ ℝⁿ is mapped to a vector of monomials up to a chosen total degree d (e.g., all terms x₁, x₁x₂, x₁², …).
    • Each monomial is multiplied by a learnable coefficient; the collection of coefficients forms a weight matrix W that is trained by gradient descent.
  2. Output Layer
    • The polynomial feature vector is fed into a standard linear (or shallow non‑linear) output layer to produce the final prediction.
  3. Training Protocol
    • SUPNs are trained with the same optimizers and loss functions commonly used for DNNs (e.g., Adam + MSE).
    • Because the hidden layer is shallow, back‑propagation is cheap and the loss landscape is far less riddled with spurious local minima.
  4. Theoretical Analysis
    • The authors leverage classical approximation theory (Jackson‑type inequalities) to bound the SUPN error by the best possible polynomial error of degree d.
    • They derive quasi‑optimal coefficient choices by solving a least‑squares problem on the training data, which serves as a good initialization for gradient‑based refinement.

Results & Findings

SettingParameters (≈)Avg. Test ErrorVariability (Std.)
1‑D smooth functionSUPN: 1501.2 e‑40.3 e‑4
1‑D smooth functionDNN (3‑layer, 1500)9.8 e‑42.1 e‑4
2‑D non‑smooth (kink)SUPN: 8003.5 e‑30.4 e‑3
2‑D non‑smooth (kink)KAN: 80001.2 e‑21.0 e‑2
10‑D polynomial‑likeSUPN: 2 5005.1 e‑30.6 e‑3
10‑D polynomial‑likeDNN (5‑layer, 25 000)7.8 e‑31.4 e‑3

Key take‑aways

  • Error vs. Parameter Count: For a fixed budget of trainable weights, SUPNs consistently achieve lower approximation error—often an order of magnitude better than DNNs/KANs.
  • Stability: The standard deviation across random seeds is markedly smaller for SUPNs, indicating reduced sensitivity to initialization.
  • Non‑Smooth Performance: Even when the target function contains discontinuities, SUPNs outperform plain polynomial projection, suggesting that the learned coefficients adapt to capture localized irregularities.

Practical Implications

  • Faster Prototyping: With far fewer parameters, SUPNs train in seconds on a CPU for low‑dimensional problems, making them ideal for rapid experimentation or edge‑device deployment.
  • Interpretability: Because the hidden representation is an explicit polynomial, developers can inspect coefficient magnitudes to understand feature interactions—something opaque in deep nets.
  • Reduced Overfitting: The compact parameter space acts as a built‑in regularizer, which is valuable when data is scarce (e.g., scientific simulations, sensor calibration).
  • Hybrid Pipelines: SUPNs can serve as a drop‑in replacement for the feature extractor block in existing pipelines (e.g., before a downstream classifier), delivering a lightweight yet expressive representation.
  • Compatibility with Existing Toolchains: Implementations require only standard tensor operations (monomial expansion, matrix multiplication), so they can be built on top of PyTorch, TensorFlow, or JAX without custom kernels.

Limitations & Future Work

  • Scalability to Very High Dimensions: The number of monomials grows combinatorially with input dimension and polynomial degree, which can become prohibitive beyond ~10‑15 dimensions unless sparsity or low‑rank tricks are employed.
  • Choice of Polynomial Basis: The paper uses total‑degree monomials; exploring orthogonal bases (e.g., Legendre, Chebyshev) or adaptive basis selection could further improve conditioning and accuracy.
  • Extension to Classification: Experiments focus on regression; applying SUPNs to categorical tasks (with softmax outputs) remains an open question.
  • Integration with Modern Regularizers: Investigating how dropout, weight decay, or spectral normalization interact with the polynomial layer could yield even more robust models.

Bottom line: SUPNs offer a compelling middle ground between the raw power of deep nets and the elegance of classical polynomial approximation, delivering high accuracy with a fraction of the parameters—a win for developers who need fast, interpretable, and reliable models.

Authors

  • Zachary Morrow
  • Michael Penwarden
  • Brian Chen
  • Aurya Javeed
  • Akil Narayan
  • John D. Jakeman

Paper Information

  • arXiv ID: 2511.21414v1
  • Categories: cs.LG, math.NA
  • Published: November 26, 2025
  • PDF: Download PDF
Back to Blog

Related posts

Read more »