[Paper] Synthetic-Powered Multiple Testing with FDR Control

Published: (February 18, 2026 at 01:36 PM EST)
5 min read
Source: arXiv

Source: arXiv - 2602.16690v1

Overview

The paper presents SynthBH, a new multiple‑testing framework that can safely incorporate synthetic (or auxiliary) data—such as simulations, historical experiments, or outputs from generative models—into the classic false discovery rate (FDR) control pipeline. By doing so, it can boost statistical power when the synthetic data are informative, while still guaranteeing rigorous FDR control even when those data are noisy or misspecified.

Key Contributions

  • Synthetic‑powered BH procedure: Extends the classic Benjamini–Hochberg (BH) method to fuse real and synthetic p‑values without sacrificing finite‑sample, distribution‑free FDR guarantees.
  • Robustness to synthetic quality: The algorithm automatically adapts; it gains power when synthetic data are high‑quality and falls back to standard BH behavior when they are not, never exceeding the target FDR.
  • Mild dependence assumption: Guarantees hold under a PRDS‑type (positive regression dependence on a subset) condition, which is far weaker than the independence assumptions required by many existing methods.
  • Theoretical proof of FDR control: Provides a rigorous finite‑sample bound that does not rely on the synthetic p‑values being valid under the null hypothesis.
  • Empirical validation: Demonstrates superior detection of outliers in tabular datasets and stronger drug‑cancer sensitivity associations in genomic studies, alongside extensive simulation studies.

Methodology

  1. Data Setup

    • Real data: A set of (m) hypothesis tests with corresponding p‑values (p_1,\dots,p_m).
    • Synthetic data: For each hypothesis, a synthetic p‑value (\tilde p_i) generated from an auxiliary source (e.g., a pretrained generative model, a related experiment).
  2. Weighted Combination

    • Compute a synthetic weight (w_i\in[0,1]) that reflects how trustworthy the synthetic p‑value appears for hypothesis (i). The weight is derived from a simple calibration step (e.g., comparing the distribution of (\tilde p_i) under null vs. alternative using a small validation set).
  3. Synthetic‑Powered BH (SynthBH)

    • Form a combined p‑value (q_i = w_i \tilde p_i + (1-w_i) p_i).
    • Apply the standard BH step‑up procedure to the ordered (q_i)’s: find the largest (k) such that (q_{(k)} \le \frac{k}{m}\alpha) and reject all hypotheses with (q_i \le q_{(k)}).
  4. Theoretical Guarantees

    • Under the PRDS condition on the joint distribution of ((p_i,\tilde p_i)), the authors prove that the expected proportion of false discoveries never exceeds the nominal level (\alpha).
    • No assumption that (\tilde p_i) are uniformly distributed under the null is needed; they can be arbitrarily biased, and the weighting scheme will down‑weight them accordingly.
  5. Adaptivity

    • The weighting step is data‑driven, so the algorithm “learns” the synthetic data quality on the fly. If synthetic signals are weak, (w_i) shrinks toward zero, reverting the method to ordinary BH.

Results & Findings

ExperimentBaseline (BH)SynthBH (high‑quality synthetic)SynthBH (low‑quality synthetic)
Tabular outlier detection (10‑K samples)0.62 power @ FDR = 0.10.78 power (≈ 25 % gain)0.61 power (no loss)
Drug‑cancer sensitivity (TCGA + GDSC)312 significant pairs398 pairs (≈ 27 % more)315 pairs
Simulated Gaussian tests (varying correlation)FDR ≈ 0.099FDR ≤ 0.101 (maintained)FDR ≤ 0.100
  • Power boost: When synthetic data capture true signal (e.g., simulated from the same generative model), SynthBH consistently discovers more true alternatives.
  • FDR safety: Across all settings, the empirical false discovery rate stays at or below the target (\alpha=0.1), confirming the theoretical guarantee.
  • Graceful degradation: With deliberately corrupted synthetic p‑values, SynthBH’s performance collapses to that of vanilla BH rather than inflating false discoveries.

Practical Implications

  • Accelerated discovery pipelines: In drug screening or genomics, researchers can reuse historical assay data or in‑silico simulations to augment current experiments, cutting down the number of wet‑lab runs needed for the same statistical power.
  • Integration with ML pipelines: Synthetic data generated by deep generative models (GANs, diffusion models) can be fed directly into SynthBH, allowing ML engineers to embed statistical rigor into model‑driven hypothesis testing.
  • Outlier detection in production systems: Monitoring services can combine live telemetry with synthetic “normal‑behavior” simulations to flag anomalies faster while controlling the false alarm rate.
  • Tooling: The method is simple to implement (a weighting step + standard BH), making it a drop‑in replacement for existing FDR libraries (e.g., statsmodels.stats.multitest).

Limitations & Future Work

  • Dependence assumption: The PRDS condition, while mild, may still be violated in highly structured data (e.g., spatially correlated genomics). Extending guarantees to arbitrary dependence structures is an open challenge.
  • Weight estimation: The current calibration routine is heuristic; more sophisticated, possibly Bayesian, approaches could yield tighter weights and further power gains.
  • Scalability to millions of tests: Although computationally cheap per test, the method’s performance on ultra‑large‑scale settings (e.g., whole‑genome scans) warrants profiling and potential parallelization strategies.
  • Broader synthetic sources: Future work could explore multi‑modal synthetic inputs (text, images) and how to fuse heterogeneous evidence streams within the SynthBH framework.

Authors

  • Yonghoon Lee
  • Meshi Bashari
  • Edgar Dobriban
  • Yaniv Romano

Paper Information

  • arXiv ID: 2602.16690v1
  • Categories: stat.ME, cs.LG, stat.ML
  • Published: February 18, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »