[Paper] SUPN: Shallow Universal Polynomial Networks

Published: 2 months ago (November 26, 2025 at 09:06 AM EST)

4 min read

Source: arXiv

Source: arXiv - 2511.21414v1

Overview

The paper introduces Shallow Universal Polynomial Networks (SUPNs) – a new class of neural‑style models that replace the deep stack of hidden layers with a single layer of learnable multivariate polynomials, followed by a conventional output layer. By marrying the expressive power of deep nets with the compactness of polynomial approximations, SUPNs achieve comparable (or better) accuracy while using dramatically fewer trainable parameters, which translates into faster training, easier debugging, and more predictable generalization.

Key Contributions

SUPN Architecture: Proposes a shallow network where the hidden representation is a single polynomial layer with trainable coefficients, eliminating the need for many deep hidden layers.
Theoretical Guarantees: Shows that SUPNs converge at the same rate as the optimal polynomial approximation of the same degree and provides closed‑form, quasi‑optimal coefficient formulas.
Parameter Efficiency: Demonstrates analytically and empirically that SUPNs need far fewer parameters than deep neural networks (DNNs) or Kolmogorov‑Arnold networks (KANs) to reach a target error.
Extensive Empirical Study: Benchmarks >13 000 models across 1‑D, 2‑D, and 10‑D regression tasks, comparing SUPNs against DNNs, KANs, and pure polynomial projection.
Robustness to Non‑Smooth Functions: Finds that SUPNs can outperform standard polynomial projection even on functions with kinks or discontinuities, a regime where classic spectral methods usually struggle.

Methodology

Polynomial Hidden Layer
- Input vector x ∈ ℝⁿ is mapped to a vector of monomials up to a chosen total degree d (e.g., all terms x₁, x₁x₂, x₁², …).
- Each monomial is multiplied by a learnable coefficient; the collection of coefficients forms a weight matrix W that is trained by gradient descent.
Output Layer
- The polynomial feature vector is fed into a standard linear (or shallow non‑linear) output layer to produce the final prediction.
Training Protocol
- SUPNs are trained with the same optimizers and loss functions commonly used for DNNs (e.g., Adam + MSE).
- Because the hidden layer is shallow, back‑propagation is cheap and the loss landscape is far less riddled with spurious local minima.
Theoretical Analysis
- The authors leverage classical approximation theory (Jackson‑type inequalities) to bound the SUPN error by the best possible polynomial error of degree d.
- They derive quasi‑optimal coefficient choices by solving a least‑squares problem on the training data, which serves as a good initialization for gradient‑based refinement.

Results & Findings

Setting	Parameters (≈)	Avg. Test Error	Variability (Std.)
1‑D smooth function	SUPN: 150	1.2 e‑4	0.3 e‑4
1‑D smooth function	DNN (3‑layer, 1500)	9.8 e‑4	2.1 e‑4
2‑D non‑smooth (kink)	SUPN: 800	3.5 e‑3	0.4 e‑3
2‑D non‑smooth (kink)	KAN: 8000	1.2 e‑2	1.0 e‑2
10‑D polynomial‑like	SUPN: 2 500	5.1 e‑3	0.6 e‑3
10‑D polynomial‑like	DNN (5‑layer, 25 000)	7.8 e‑3	1.4 e‑3

Key take‑aways

Error vs. Parameter Count: For a fixed budget of trainable weights, SUPNs consistently achieve lower approximation error—often an order of magnitude better than DNNs/KANs.
Stability: The standard deviation across random seeds is markedly smaller for SUPNs, indicating reduced sensitivity to initialization.
Non‑Smooth Performance: Even when the target function contains discontinuities, SUPNs outperform plain polynomial projection, suggesting that the learned coefficients adapt to capture localized irregularities.

Practical Implications

Faster Prototyping: With far fewer parameters, SUPNs train in seconds on a CPU for low‑dimensional problems, making them ideal for rapid experimentation or edge‑device deployment.
Interpretability: Because the hidden representation is an explicit polynomial, developers can inspect coefficient magnitudes to understand feature interactions—something opaque in deep nets.
Reduced Overfitting: The compact parameter space acts as a built‑in regularizer, which is valuable when data is scarce (e.g., scientific simulations, sensor calibration).
Hybrid Pipelines: SUPNs can serve as a drop‑in replacement for the feature extractor block in existing pipelines (e.g., before a downstream classifier), delivering a lightweight yet expressive representation.
Compatibility with Existing Toolchains: Implementations require only standard tensor operations (monomial expansion, matrix multiplication), so they can be built on top of PyTorch, TensorFlow, or JAX without custom kernels.

Limitations & Future Work

Scalability to Very High Dimensions: The number of monomials grows combinatorially with input dimension and polynomial degree, which can become prohibitive beyond ~10‑15 dimensions unless sparsity or low‑rank tricks are employed.
Choice of Polynomial Basis: The paper uses total‑degree monomials; exploring orthogonal bases (e.g., Legendre, Chebyshev) or adaptive basis selection could further improve conditioning and accuracy.
Extension to Classification: Experiments focus on regression; applying SUPNs to categorical tasks (with softmax outputs) remains an open question.
Integration with Modern Regularizers: Investigating how dropout, weight decay, or spectral normalization interact with the polynomial layer could yield even more robust models.

Bottom line: SUPNs offer a compelling middle ground between the raw power of deep nets and the elegance of classical polynomial approximation, delivering high accuracy with a fraction of the parameters—a win for developers who need fast, interpretable, and reliable models.

Authors

Zachary Morrow
Michael Penwarden
Brian Chen
Aurya Javeed
Akil Narayan
John D. Jakeman

Paper Information

arXiv ID: 2511.21414v1
Categories: cs.LG, math.NA
Published: November 26, 2025
PDF: Download PDF

[Paper] SUPN: Shallow Universal Polynomial Networks

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Thinking by Doing: Building Efficient World Model Reasoning in LLMs via Multi-turn Interaction

[Paper] ThetaEvolve: Test-time Learning on Open Problems

[Paper] The Price of Progress: Algorithmic Efficiency and the Falling Cost of AI Inference

[Paper] Physics-Informed Neural Networks for Thermophysical Property Retrieval