[Paper] Riesz Representer Fitting under Bregman Divergence: A Unified Framework for Debiased Machine Learning

Published: (January 12, 2026 at 12:36 PM EST)
4 min read
Source: arXiv

Source: arXiv - 2601.07752v1

Overview

The paper introduces a unified framework for estimating the Riesz representer—a core component in debiased machine learning methods used for causal inference and structural parameter estimation. By framing the estimation problem as fitting under a Bregman divergence, the author shows that many seemingly disparate techniques (e.g., Riesz regression, covariate‑balancing weights, entropy balancing) are actually special cases of the same underlying optimization.

Key Contributions

  • Unified Bregman‑Divergence Formulation – Shows that fitting the Riesz representer under any Bregman divergence subsumes existing methods (squared loss → Riesz regression; KL divergence → entropy‑balancing weights).
  • Automatic Covariate Balancing – Derives a dual interpretation where the optimal dual variables correspond to stable balancing weights, eliminating the need for hand‑crafted balancing constraints.
  • Generalized Riesz Regression – Extends classic Riesz regression to a broader class of loss functions, enabling more flexible model choices.
  • Link to Density‑Ratio Estimation – Demonstrates that density‑ratio fitting is a special case of the proposed framework, bridging causal inference and unsupervised learning tools.
  • Theoretical Guarantees – Provides convergence rates for both RKHS (kernel‑based) and neural‑network function classes, showing the method’s statistical soundness in high‑dimensional settings.
  • Practical Algorithmic Blueprint – Supplies a clear recipe for implementing the generalized estimator using off‑the‑shelf solvers (e.g., stochastic gradient descent for neural nets, kernel ridge regression for RKHS).

Methodology

  1. Problem Setup

    • The Riesz representer ( \alpha^(\cdot) ) satisfies a linear functional relationship: for any function ( f ) in a Hilbert space, ( \langle \alpha^, f \rangle = \psi(f) ), where ( \psi ) is the target functional (e.g., a causal effect).
  2. Bregman Divergence Objective

    • Choose a convex generator ( \phi ) (e.g., ( \phi(u)=\tfrac12u^2 ) for squared loss or ( \phi(u)=u\log u - u ) for KL).
    • Fit a parametric model ( \alpha_\theta ) by minimizing the empirical Bregman divergence between the model’s predictions and the true (unknown) representer:
      [ \min_\theta \frac{1}{n}\sum_{i=1}^n D_\phi\bigl(\alpha_\theta(X_i),; \text{target}_i\bigr). ]
    • The “target” values are constructed from observed data and the functional ( \psi ) (e.g., residuals from a nuisance model).
  3. Dual Interpretation

    • By convex duality, the minimization yields a dual problem whose solution gives balancing weights ( w_i ).
    • For squared loss, the dual weights coincide with those from classic Riesz regression; for KL, they become entropy‑balancing weights that automatically satisfy covariate balance constraints.
  4. Model Classes

    • RKHS: Use kernel functions to represent ( \alpha_\theta ); the optimization reduces to a kernel ridge problem with a Bregman‑type loss.
    • Neural Networks: Parameterize ( \alpha_\theta ) with a deep net and train with stochastic gradient descent, leveraging automatic differentiation for any Bregman loss.

Results & Findings

SettingLoss (Bregman)Method RecoveredEmpirical Observation
Squared loss( \phi(u)=\tfrac12u^2 )Riesz regressionComparable bias reduction to classic debiased estimators; variance matches theoretical predictions.
KL divergence( \phi(u)=u\log u - u )Entropy balancingProduces stable weights with lower variance than manually tuned balancing constraints.
General BregmanAny convex ( \phi )New estimatorsDemonstrates flexibility: e.g., Huber‑type loss yields robustness to outliers.

The convergence analysis shows that, under standard smoothness assumptions, the estimator attains (O_p(n^{-1/2})) rates in both RKHS and neural‑net settings, matching the optimal rate for semiparametric inference.

Practical Implications

  • One‑Stop Shop for Debiased Estimation – Practitioners can pick a loss that best matches their data (e.g., KL for positivity constraints, Huber for heavy‑tailed outcomes) without redesigning the whole pipeline.
  • Automatic Weight Generation – The dual formulation eliminates manual covariate‑balancing steps, simplifying workflows for causal inference in A/B testing, policy evaluation, and uplift modeling.
  • Scalable to Modern ML Stacks – Because the method works with neural nets, it can be plugged into existing deep‑learning pipelines (PyTorch, TensorFlow) and benefit from GPU acceleration.
  • Bridges Causal and Unsupervised Learning – The density‑ratio perspective opens doors to reuse tools from domain adaptation, importance sampling, and generative modeling for causal tasks.
  • Better Regularization Choices – By selecting a Bregman divergence aligned with the problem geometry, developers can achieve lower variance or robustness without extra hyper‑parameter tuning.

Limitations & Future Work

  • Dependence on Nuisance Estimates – The quality of the Riesz representer still hinges on accurate first‑stage nuisance models (e.g., propensity scores, outcome regressions).
  • Computational Overhead for Large RKHS – Kernel methods may become prohibitive in massive datasets; the paper suggests random feature approximations but leaves detailed scalability studies for later.
  • Choice of Bregman Divergence – While the framework is flexible, guidance on selecting the “right” divergence for a given application is still empirical.
  • Extension to Time‑Series / Panel Data – The current theory assumes i.i.d. observations; extending to dependent data structures is an open research direction.

Bottom line: The paper offers a powerful, mathematically grounded yet practically implementable toolkit for debiased machine learning, turning a collection of ad‑hoc tricks into a single, extensible optimization problem. Developers can now leverage familiar ML libraries to obtain statistically robust causal estimates with far less manual fiddling.

Authors

  • Masahiro Kato

Paper Information

  • arXiv ID: 2601.07752v1
  • Categories: econ.EM, cs.LG, math.ST, stat.ME, stat.ML
  • Published: January 12, 2026
  • PDF: Download PDF
Back to Blog

Related posts

Read more »