[Paper] Synthetic-Powered Multiple Testing with FDR Control

Published: 2 months ago (February 18, 2026 at 01:36 PM EST)

5 min read

Source: arXiv

Source: arXiv - 2602.16690v1

Overview

The paper presents SynthBH, a new multiple‑testing framework that can safely incorporate synthetic (or auxiliary) data—such as simulations, historical experiments, or outputs from generative models—into the classic false discovery rate (FDR) control pipeline. By doing so, it can boost statistical power when the synthetic data are informative, while still guaranteeing rigorous FDR control even when those data are noisy or misspecified.

Key Contributions

Synthetic‑powered BH procedure: Extends the classic Benjamini–Hochberg (BH) method to fuse real and synthetic p‑values without sacrificing finite‑sample, distribution‑free FDR guarantees.
Robustness to synthetic quality: The algorithm automatically adapts; it gains power when synthetic data are high‑quality and falls back to standard BH behavior when they are not, never exceeding the target FDR.
Mild dependence assumption: Guarantees hold under a PRDS‑type (positive regression dependence on a subset) condition, which is far weaker than the independence assumptions required by many existing methods.
Theoretical proof of FDR control: Provides a rigorous finite‑sample bound that does not rely on the synthetic p‑values being valid under the null hypothesis.
Empirical validation: Demonstrates superior detection of outliers in tabular datasets and stronger drug‑cancer sensitivity associations in genomic studies, alongside extensive simulation studies.

Methodology

Data Setup
- Real data: A set of (m) hypothesis tests with corresponding p‑values (p_1,\dots,p_m).
- Synthetic data: For each hypothesis, a synthetic p‑value (\tilde p_i) generated from an auxiliary source (e.g., a pretrained generative model, a related experiment).
Weighted Combination
- Compute a synthetic weight (w_i\in[0,1]) that reflects how trustworthy the synthetic p‑value appears for hypothesis (i). The weight is derived from a simple calibration step (e.g., comparing the distribution of (\tilde p_i) under null vs. alternative using a small validation set).
Synthetic‑Powered BH (SynthBH)
- Form a combined p‑value (q_i = w_i \tilde p_i + (1-w_i) p_i).
- Apply the standard BH step‑up procedure to the ordered (q_i)’s: find the largest (k) such that (q_{(k)} \le \frac{k}{m}\alpha) and reject all hypotheses with (q_i \le q_{(k)}).
Theoretical Guarantees
- Under the PRDS condition on the joint distribution of ((p_i,\tilde p_i)), the authors prove that the expected proportion of false discoveries never exceeds the nominal level (\alpha).
- No assumption that (\tilde p_i) are uniformly distributed under the null is needed; they can be arbitrarily biased, and the weighting scheme will down‑weight them accordingly.
Adaptivity
- The weighting step is data‑driven, so the algorithm “learns” the synthetic data quality on the fly. If synthetic signals are weak, (w_i) shrinks toward zero, reverting the method to ordinary BH.

Results & Findings

Experiment	Baseline (BH)	SynthBH (high‑quality synthetic)	SynthBH (low‑quality synthetic)
Tabular outlier detection (10‑K samples)	0.62 power @ FDR = 0.1	0.78 power (≈ 25 % gain)	0.61 power (no loss)
Drug‑cancer sensitivity (TCGA + GDSC)	312 significant pairs	398 pairs (≈ 27 % more)	315 pairs
Simulated Gaussian tests (varying correlation)	FDR ≈ 0.099	FDR ≤ 0.101 (maintained)	FDR ≤ 0.100

Power boost: When synthetic data capture true signal (e.g., simulated from the same generative model), SynthBH consistently discovers more true alternatives.
FDR safety: Across all settings, the empirical false discovery rate stays at or below the target (\alpha=0.1), confirming the theoretical guarantee.
Graceful degradation: With deliberately corrupted synthetic p‑values, SynthBH’s performance collapses to that of vanilla BH rather than inflating false discoveries.

Practical Implications

Accelerated discovery pipelines: In drug screening or genomics, researchers can reuse historical assay data or in‑silico simulations to augment current experiments, cutting down the number of wet‑lab runs needed for the same statistical power.
Integration with ML pipelines: Synthetic data generated by deep generative models (GANs, diffusion models) can be fed directly into SynthBH, allowing ML engineers to embed statistical rigor into model‑driven hypothesis testing.
Outlier detection in production systems: Monitoring services can combine live telemetry with synthetic “normal‑behavior” simulations to flag anomalies faster while controlling the false alarm rate.
Tooling: The method is simple to implement (a weighting step + standard BH), making it a drop‑in replacement for existing FDR libraries (e.g., statsmodels.stats.multitest).

Limitations & Future Work

Dependence assumption: The PRDS condition, while mild, may still be violated in highly structured data (e.g., spatially correlated genomics). Extending guarantees to arbitrary dependence structures is an open challenge.
Weight estimation: The current calibration routine is heuristic; more sophisticated, possibly Bayesian, approaches could yield tighter weights and further power gains.
Scalability to millions of tests: Although computationally cheap per test, the method’s performance on ultra‑large‑scale settings (e.g., whole‑genome scans) warrants profiling and potential parallelization strategies.
Broader synthetic sources: Future work could explore multi‑modal synthetic inputs (text, images) and how to fuse heterogeneous evidence streams within the SynthBH framework.

Authors

Yonghoon Lee
Meshi Bashari
Edgar Dobriban
Yaniv Romano

Paper Information

arXiv ID: 2602.16690v1
Categories: stat.ME, cs.LG, stat.ML
Published: February 18, 2026
PDF: Download PDF

[Paper] Synthetic-Powered Multiple Testing with FDR Control

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] The Geometry of Noise: Why Diffusion Models Don't Need Noise Conditioning

[Paper] Subgroups of $U(d)$ Induce Natural RNN and Transformer Architectures

[Paper] Unifying approach to uniform expressivity of graph neural networks

[Paper] Latent Equivariant Operators for Robust Object Recognition: Promise and Challenges