[Paper] A Note on Non-Negative $L_1$-Approximating Polynomials

Published: (May 8, 2026 at 01:55 PM EDT)
5 min read
Source: arXiv

Source: arXiv - 2605.08072v1

Overview

A recent short note by Lee, Mehrotra, and Zampetakis tackles a subtle but important twist on a classic tool in learning theory: (L_1)-approximating polynomials. While ordinary (L_1) approximators can dip below zero, many modern algorithms (e.g., those that learn from only positive examples) need non‑negative approximations. The authors show that for any set whose Gaussian surface area (GSA) is bounded, you can construct low‑degree polynomials that stay non‑negative everywhere and still approximate the set’s indicator function in the (L_1) sense under a Gaussian distribution.

Key Contributions

  • Non‑negative (L_1) approximators: Prove existence of degree‑(k) polynomials that are always (\ge 0) and (\varepsilon)-close to the target indicator in (L_1) under the standard Gaussian.
  • Degree bound matching the unrestricted case: Show that the required degree is (\tilde{O}!\left(\frac{\Gamma^2}{\varepsilon^2}\right)), where (\Gamma) is the GSA of the set—essentially the same order as the best known bound when non‑negativity is not required.
  • Bridging two approximation regimes: Clarify the relationship between ordinary (L_1) approximation, non‑negative approximation, and the stronger “sandwiching” polynomials used in pseudorandomness.
  • Concrete link to smoothed positive‑only learning: Highlight how the result directly supports recent algorithms that learn from only positively labeled data under Gaussian noise.

Methodology

  1. Gaussian surface area (GSA) as a complexity measure:
    • The authors use GSA, a geometric quantity that captures how “wiggly” the boundary of a set is under a Gaussian measure. Low GSA means the set’s boundary is relatively smooth.
  2. Fourier‑analytic construction:
    • They expand the indicator function in the Hermite polynomial basis (the natural orthogonal basis for Gaussian space).
    • By truncating the expansion at degree (k), they obtain a polynomial (p_k) that approximates the indicator in (L_1).
  3. Ensuring non‑negativity:
    • Instead of taking the raw truncation, they apply a simple shift and squaring trick: add a small constant to lift the polynomial above zero, then square it (or use a ReLU‑style clipping) to guarantee the output never goes negative.
    • Careful analysis shows this transformation only inflates the (L_1) error by a factor that can be absorbed into the (\varepsilon) budget, provided (k) scales as (\tilde{O}(\Gamma^2/\varepsilon^2)).
  4. Technical lemmas:
    • The note leverages known bounds on the Hermite coefficients of sets with bounded GSA and a concentration inequality for Gaussian polynomials to control the tail error after truncation.

Results & Findings

  • Existence theorem: For any measurable set (S \subset \mathbb{R}^n) with Gaussian surface area (\Gamma), there exists a polynomial (p) of degree
    [ k = \tilde{O}!\left(\frac{\Gamma^2}{\varepsilon^2}\right) ]
    such that
    [ p(x) \ge 0 \quad \text{for all } x,\qquad \mathbb{E}_{x \sim \mathcal{N}(0,I)}\bigl[|p(x)-\mathbf{1}_S(x)|\bigr] \le \varepsilon . ]
  • Tightness: This degree matches (up to constants hidden in the (\tilde{O}) notation) the best known bound for unconstrained (L_1) approximation, meaning the non‑negativity requirement does not force a higher-degree polynomial in the worst case.
  • Implication for sandwiching polynomials: Since sandwiching polynomials require both an upper and a lower bound, the authors’ construction sits neatly between ordinary approximators and full sandwiching, offering a lighter-weight tool when only a lower bound (non‑negativity) is needed.

Practical Implications

  • Positive‑only learning pipelines: Many real‑world data collection processes (e.g., click‑through data, medical diagnosis) yield only positive examples. Algorithms that rely on a non‑negative surrogate for the target concept can now use low‑degree Gaussian polynomials with provable guarantees, simplifying model design and analysis.
  • Feature engineering for Gaussian‑like data: In domains where inputs are naturally modeled as Gaussian (e.g., sensor noise, embeddings after whitening), developers can replace expensive kernel methods with compact polynomial features that respect the non‑negativity constraint, leading to faster inference.
  • Robustness to noise: The construction is inherently “smoothed” because it works under the Gaussian measure; this aligns well with techniques that add Gaussian noise for differential privacy or regularization, offering a theoretical safety net.
  • Simplified pseudorandomness checks: While full sandwiching polynomials are used to certify that a distribution fools certain tests, the new non‑negative approximators can serve as a cheaper proxy when only one‑sided guarantees are sufficient (e.g., detecting anomalies that should never be negative).

Limitations & Future Work

  • Scope limited to Gaussian distributions: The proof heavily exploits properties of the standard Gaussian (Hermite basis, isoperimetry). Extending the result to other product distributions (e.g., uniform on the hypercube) remains open.
  • Dependence on GSA: Sets with large Gaussian surface area may require prohibitively high degree; finding tighter bounds for specific structured sets (e.g., halfspaces, low‑dimensional manifolds) could make the approach more practical.
  • Constructive algorithm: The paper establishes existence but does not provide an explicit, efficient algorithm for computing the polynomial given a black‑box description of the set. Future work could focus on turning the proof into a usable learning routine.
  • Empirical validation: No experiments are presented. Benchmarking the non‑negative approximators against existing methods on real datasets would help gauge the constant factors hidden in the (\tilde{O}) notation.

Bottom line: This note shows that, at least for Gaussian‑smooth sets, demanding non‑negativity does not cost you extra polynomial degree. For developers building models that must respect positivity—think probability estimators, risk scores, or any “only‑positive” signal—this result opens the door to lightweight, theoretically grounded polynomial approximations.

Authors

  • Jane H. Lee
  • Anay Mehrotra
  • Manolis Zampetakis

Paper Information

  • arXiv ID: 2605.08072v1
  • Categories: stat.ML, cs.DS, cs.LG, math.ST
  • Published: May 8, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »

[Paper] Normalizing Trajectory Models

Diffusion-based models decompose sampling into many small Gaussian denoising steps -- an assumption that breaks down when generation is compressed to a few coar...