[Paper] Riesz Representer Fitting under Bregman Divergence: A Unified Framework for Debiased Machine Learning

Published: 1 week ago (January 12, 2026 at 12:36 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2601.07752v1

Overview

The paper introduces a unified framework for estimating the Riesz representer—a core component in debiased machine learning methods used for causal inference and structural parameter estimation. By framing the estimation problem as fitting under a Bregman divergence, the author shows that many seemingly disparate techniques (e.g., Riesz regression, covariate‑balancing weights, entropy balancing) are actually special cases of the same underlying optimization.

Key Contributions

Unified Bregman‑Divergence Formulation – Shows that fitting the Riesz representer under any Bregman divergence subsumes existing methods (squared loss → Riesz regression; KL divergence → entropy‑balancing weights).
Automatic Covariate Balancing – Derives a dual interpretation where the optimal dual variables correspond to stable balancing weights, eliminating the need for hand‑crafted balancing constraints.
Generalized Riesz Regression – Extends classic Riesz regression to a broader class of loss functions, enabling more flexible model choices.
Link to Density‑Ratio Estimation – Demonstrates that density‑ratio fitting is a special case of the proposed framework, bridging causal inference and unsupervised learning tools.
Theoretical Guarantees – Provides convergence rates for both RKHS (kernel‑based) and neural‑network function classes, showing the method’s statistical soundness in high‑dimensional settings.
Practical Algorithmic Blueprint – Supplies a clear recipe for implementing the generalized estimator using off‑the‑shelf solvers (e.g., stochastic gradient descent for neural nets, kernel ridge regression for RKHS).

Methodology

Problem Setup
- The Riesz representer ( \alpha^(\cdot) ) satisfies a linear functional relationship: for any function ( f ) in a Hilbert space, ( \langle \alpha^, f \rangle = \psi(f) ), where ( \psi ) is the target functional (e.g., a causal effect).
Bregman Divergence Objective
- Choose a convex generator ( \phi ) (e.g., ( \phi(u)=\tfrac12u^2 ) for squared loss or ( \phi(u)=u\log u - u ) for KL).
- Fit a parametric model ( \alpha_\theta ) by minimizing the empirical Bregman divergence between the model’s predictions and the true (unknown) representer:
  [ \min_\theta \frac{1}{n}\sum_{i=1}^n D_\phi\bigl(\alpha_\theta(X_i),; \text{target}_i\bigr). ]
- The “target” values are constructed from observed data and the functional ( \psi ) (e.g., residuals from a nuisance model).
Dual Interpretation
- By convex duality, the minimization yields a dual problem whose solution gives balancing weights ( w_i ).
- For squared loss, the dual weights coincide with those from classic Riesz regression; for KL, they become entropy‑balancing weights that automatically satisfy covariate balance constraints.
Model Classes
- RKHS: Use kernel functions to represent ( \alpha_\theta ); the optimization reduces to a kernel ridge problem with a Bregman‑type loss.
- Neural Networks: Parameterize ( \alpha_\theta ) with a deep net and train with stochastic gradient descent, leveraging automatic differentiation for any Bregman loss.

Results & Findings

Setting	Loss (Bregman)	Method Recovered	Empirical Observation
Squared loss	( \phi(u)=\tfrac12u^2 )	Riesz regression	Comparable bias reduction to classic debiased estimators; variance matches theoretical predictions.
KL divergence	( \phi(u)=u\log u - u )	Entropy balancing	Produces stable weights with lower variance than manually tuned balancing constraints.
General Bregman	Any convex ( \phi )	New estimators	Demonstrates flexibility: e.g., Huber‑type loss yields robustness to outliers.

The convergence analysis shows that, under standard smoothness assumptions, the estimator attains (O_p(n^{-1/2})) rates in both RKHS and neural‑net settings, matching the optimal rate for semiparametric inference.

Practical Implications

One‑Stop Shop for Debiased Estimation – Practitioners can pick a loss that best matches their data (e.g., KL for positivity constraints, Huber for heavy‑tailed outcomes) without redesigning the whole pipeline.
Automatic Weight Generation – The dual formulation eliminates manual covariate‑balancing steps, simplifying workflows for causal inference in A/B testing, policy evaluation, and uplift modeling.
Scalable to Modern ML Stacks – Because the method works with neural nets, it can be plugged into existing deep‑learning pipelines (PyTorch, TensorFlow) and benefit from GPU acceleration.
Bridges Causal and Unsupervised Learning – The density‑ratio perspective opens doors to reuse tools from domain adaptation, importance sampling, and generative modeling for causal tasks.
Better Regularization Choices – By selecting a Bregman divergence aligned with the problem geometry, developers can achieve lower variance or robustness without extra hyper‑parameter tuning.

Limitations & Future Work

Dependence on Nuisance Estimates – The quality of the Riesz representer still hinges on accurate first‑stage nuisance models (e.g., propensity scores, outcome regressions).
Computational Overhead for Large RKHS – Kernel methods may become prohibitive in massive datasets; the paper suggests random feature approximations but leaves detailed scalability studies for later.
Choice of Bregman Divergence – While the framework is flexible, guidance on selecting the “right” divergence for a given application is still empirical.
Extension to Time‑Series / Panel Data – The current theory assumes i.i.d. observations; extending to dependent data structures is an open research direction.

Bottom line: The paper offers a powerful, mathematically grounded yet practically implementable toolkit for debiased machine learning, turning a collection of ad‑hoc tricks into a single, extensible optimization problem. Developers can now leverage familiar ML libraries to obtain statistically robust causal estimates with far less manual fiddling.

Authors

Masahiro Kato

Paper Information

arXiv ID: 2601.07752v1
Categories: econ.EM, cs.LG, math.ST, stat.ME, stat.ML
Published: January 12, 2026
PDF: Download PDF

[Paper] Riesz Representer Fitting under Bregman Divergence: A Unified Framework for Debiased Machine Learning

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Do explanations generalize across large reasoning models?

[Paper] Building Production-Ready Probes For Gemini

[Paper] ShapeR: Robust Conditional 3D Shape Generation from Casual Captures

[Paper] MetaboNet: The Largest Publicly Available Consolidated Dataset for Type 1 Diabetes Management