[Paper] Preference-based Conditional Treatment Effects and Policy Learning

Published: (February 3, 2026 at 01:31 PM EST)
5 min read
Source: arXiv

Source: arXiv - 2602.03823v1

Overview

A new statistical framework called Conditional Preference‑based Treatment Effect (CPTE) lets researchers estimate how a treatment works when the outcome is expressed only as a preference ranking rather than a precise numeric value. By focusing on “which outcome is better” instead of “how much better,” the authors open the door to flexible, real‑world causal analyses—think medical trials that compare patient‑reported health states, A/B tests that rank user satisfaction, or any setting where outcomes are ordinal, multivariate, or driven by subjective preferences.

Key Contributions

  • Preference‑based causal estimand (CPTE) that works with ranked outcomes, unifying several existing metrics (conditional probability of necessity & sufficiency, Win Ratio, Generalized Pairwise Comparisons).
  • Identifiability insights: despite the inherent non‑identifiability of comparison‑based estimands, the paper derives new conditions under which CPTE (and related metrics) become identifiable from observable data.
  • Practical estimation pipelines: three families of plug‑in estimators (matching, quantile regression, distributional regression) plus efficient influence‑function (EIF) estimators that correct bias and boost policy‑learning performance.
  • Policy learning algorithm that directly maximizes expected utility under the CPTE framework, enabling data‑driven decision rules even when outcomes are only partially ordered.
  • Empirical validation on synthetic and semi‑synthetic datasets showing substantial gains over traditional mean‑outcome‑based methods, especially when outcomes are heterogeneous or ordinal.

Methodology

  1. Define CPTE – For each individual’s covariates (X), CPTE measures the probability that the treatment‑induced outcome is preferred over the control outcome according to a user‑specified preference rule (e.g., “lower pain score is better”).
  2. Identifiability conditions – By assuming (i) overlap (both treatment arms are possible for each covariate pattern) and (ii) a latent monotonicity or stochastic dominance condition on the joint distribution of potential outcomes, the authors prove CPTE can be expressed in terms of observable quantities.
  3. Plug‑in estimators
    • Matching: pair each treated unit with a control unit having similar covariates, then compute the empirical preference indicator.
    • Quantile regression: model conditional quantiles of each potential outcome distribution; the preference indicator is derived from the estimated quantile functions.
    • Distributional regression: fit flexible conditional distribution models (e.g., normalizing flows, mixture density networks) for each arm and evaluate the preference probability via Monte‑Carlo integration.
  4. Influence‑function correction – The authors derive the EIF for CPTE, enabling a one‑step bias‑correction that turns any plug‑in estimator into a statistically efficient estimator. This also yields a doubly robust policy‑learning objective that can be optimized with stochastic gradient methods.

All steps rely on standard machine‑learning tools (propensity‑score estimation, supervised regression, deep density estimators), making the pipeline straightforward to implement in Python or R.

Results & Findings

SettingBaseline (mean‑outcome)CPTE‑plug‑inCPTE‑EIF (bias‑corrected)
Synthetic binary outcome (ordinal)0.68 AUC0.81 AUC0.86 AUC
Semi‑synthetic clinical trial (Win Ratio)0.720.840.89
High‑dimensional covariates (100 features)0.650.780.83
  • Higher predictive power: CPTE‑based estimators consistently outperformed traditional mean‑outcome estimators, especially when the true effect manifested only in the ordering of outcomes.
  • Policy gains: When the learned treatment rule was evaluated on held‑out data, the CPTE‑EIF policy achieved up to 15 % higher expected utility (as defined by the preference rule) compared with policies derived from average treatment effect estimates.
  • Robustness: The influence‑function correction reduced sensitivity to model misspecification; even when one of the nuisance models (propensity or outcome distribution) was poorly estimated, performance degraded gracefully.

Practical Implications

DomainHow CPTE HelpsExample Use‑Case
HealthcareAllows clinicians to base decisions on composite, patient‑reported outcomes (e.g., quality‑of‑life scores) without forcing a numeric summary.Choosing between two chemotherapy regimens where the endpoint is a ranked set of side‑effect profiles.
Product & UXEnables A/B testing on ordinal satisfaction metrics (e.g., “very satisfied → neutral → dissatisfied”) while still learning optimal rollout policies.Deciding whether to push a new UI change when user feedback is collected as a 5‑point Likert scale.
FinanceSupports risk‑adjusted policy learning where outcomes are ordered by regulatory preference (e.g., “no loss > small loss > large loss”).Portfolio rebalancing rules that prioritize avoiding large drawdowns over modest gains.
Recommender SystemsHandles multi‑criteria rankings (e.g., relevance + diversity) without collapsing them into a single scalar.Selecting which content to surface when users rank recommendations on a “preference” list.

Developers can plug CPTE into existing causal‑inference libraries (e.g., EconML, DoWhy) by swapping the outcome model with a distributional estimator and adding the EIF correction step. The resulting policies are interpretable (they directly optimize the probability of a preferred outcome) and compatible with standard deployment pipelines.

Limitations & Future Work

  • Non‑identifiability in the wild: The identifiability conditions (especially stochastic dominance) may be hard to verify in practice; violations can lead to biased CPTE estimates.
  • Computational cost: Distributional regression and Monte‑Carlo integration can be expensive for very large datasets; scalable approximations (e.g., variational inference) are an open avenue.
  • Preference specification: The framework assumes a fixed, known preference rule. Learning or eliciting the rule from users remains an open challenge.
  • Extension to dynamic treatments: Current work focuses on a single binary treatment; extending CPTE to sequential decision making (e.g., reinforcement learning) is a promising direction.

Bottom line: By reframing causal effect estimation around preferences rather than averages, CPTE equips developers and data scientists with a powerful, flexible tool for building smarter, outcome‑aware policies in domains where “how much” is less important than “which is better.”

Authors

  • Dovid Parnas
  • Mathieu Even
  • Julie Josse
  • Uri Shalit

Paper Information

  • arXiv ID: 2602.03823v1
  • Categories: stat.ML, cs.LG
  • Published: February 3, 2026
  • PDF: Download PDF
Back to Blog

Related posts

Read more »