[Paper] Integrating Feature Correlation in Differential Privacy with Applications in DP-ERM
Source: arXiv - 2605.03945v1
Overview
The paper tackles a subtle but important gap in differential privacy (DP): the one‑size‑fits‑all privacy budget that treats every feature of a dataset as equally sensitive. In many real‑world pipelines, only a subset of attributes (e.g., health codes, financial IDs) truly need protection, while others (e.g., timestamps, non‑identifying demographics) are essentially “insensitive.” The authors propose a new DP definition—CorrDP—that relaxes privacy guarantees for these insensitive features while still accounting for their statistical correlation with the sensitive ones. This yields tighter utility for downstream machine‑learning tasks such as empirical risk minimization (ERM).
Key Contributions
- CorrDP definition – a formal privacy notion that distinguishes sensitive vs. insensitive features and quantifies their correlation via total variation distance.
- Correlation‑aware DP‑ERM algorithms – gradient‑based optimization methods that inject distance‑dependent noise, scaling the perturbation to the measured correlation.
- Correlation estimation procedure – a data‑driven technique to approximate the unknown correlation distance, preserving the same privacy‑utility trade‑off.
- Theoretical utility analysis – proofs that CorrDP‑ERM achieves strictly better excess risk bounds than standard DP‑ERM when insensitive features are present.
- Empirical validation – experiments on synthetic benchmarks and real datasets (e.g., UCI Adult, credit‑card fraud) showing consistent accuracy gains over classic DP baselines.
Methodology
-
Feature Partitioning
- The dataset is split into two groups: S (sensitive) and I (insensitive).
- The privacy guarantee is enforced only on S, but the algorithm must still respect any statistical dependence between S and I.
-
Correlation Metric
- Correlation is measured by the total variation distance ( \Delta = d_{\mathrm{TV}}(P_{S,I}, P_S \times P_I) ).
- Intuitively, ( \Delta = 0 ) means the two sets are independent; larger values indicate stronger coupling.
-
CorrDP Definition
- A mechanism ( \mathcal{M} ) satisfies ((\varepsilon,\delta,\Delta))-CorrDP if for any neighboring datasets that differ only in a sensitive record, the output distributions differ by at most ((\varepsilon,\delta)) after marginalizing over the insensitive attributes, with an additional term that scales with ( \Delta ).
-
Gradient Perturbation for DP‑ERM
- Standard DP‑ERM adds isotropic Gaussian noise calibrated to the global sensitivity of the gradient.
- CorrDP‑ERM instead adds noise with variance proportional to
[ \sigma^2 = \frac{2\log(1.25/\delta)}{\varepsilon^2} \cdot (1 - \Delta). ] - When ( \Delta ) is small (weak correlation), the noise shrinks, yielding more accurate updates.
-
Estimating (\Delta) from Data
- The authors propose a private estimator based on a hold‑out sample and a two‑sample test, adding a small Laplace noise to preserve DP.
- The estimator is unbiased up to (O(1/\sqrt{n})) and can be plugged back into the noise‑scaling formula without breaking the overall privacy guarantee.
Results & Findings
| Dataset | Sensitive/Insensitive Split | Standard DP‑ERM (ε=1) | CorrDP‑ERM (ε=1) | Relative Accuracy Gain |
|---|---|---|---|---|
| Synthetic (Gaussian) | 30 % sensitive | 78 % | 86 % | +10 % |
| UCI Adult | Income (sensitive) vs. demographics (insensitive) | 81 % | 87 % | +6 % |
| Credit‑Card Fraud | Transaction amount (sensitive) vs. timestamp (insensitive) | 92 % | 95 % | +3 % |
- Utility: Across all experiments, CorrDP‑ERM reduced excess risk by 15‑30 % compared with the classic DP baseline.
- Robustness to Estimation Error: When the correlation distance was estimated rather than known, the utility loss was negligible (<2 %).
- Scalability: The algorithms run in the same asymptotic time as standard DP‑ERM (single pass over minibatches), with only a tiny overhead for the correlation estimator.
Practical Implications
- Feature‑aware privacy budgeting – Teams can assign a lower privacy budget to benign attributes (e.g., timestamps, device IDs) without compromising protection of truly sensitive fields.
- Reduced noise for ML pipelines – For tasks like logistic regression, SVMs, or deep‑learning fine‑tuning, CorrDP‑ERM translates into higher model accuracy at the same legal privacy level ((\varepsilon, \delta)).
- Regulatory compliance – Regulations such as GDPR or CCPA often require “data minimisation.” CorrDP provides a formal way to demonstrate that only the necessary attributes receive strong DP guarantees.
- Tooling integration – The methodology fits neatly into existing DP libraries (TensorFlow Privacy, PyTorch Opacus). Supplying a correlation estimate allows the library to automatically adjust the noise scale.
- Cross‑domain applicability – Any domain where protected and non‑protected fields coexist—healthcare (PHI vs. vitals), finance (account numbers vs. timestamps), IoT telemetry—can benefit from the CorrDP framework.
Limitations & Future Work
- Assumption of known partition – The approach presumes developers can correctly label features as sensitive or insensitive; mis‑classification could weaken privacy.
- Correlation measured only by total variation – While mathematically convenient, TV distance may be overly pessimistic for high‑dimensional data; exploring tighter dependence measures (e.g., mutual information) is an open direction.
- Static correlation – The current estimator treats correlation as a global scalar; future work could handle feature‑wise or instance‑wise correlation for finer granularity.
- Extension beyond ERM – Applying CorrDP to other DP primitives (e.g., private query answering, federated learning) remains to be explored.
Overall, the paper opens a practical pathway to more nuanced privacy engineering, letting developers preserve utility where it matters while still honoring rigorous differential‑privacy guarantees.
Authors
- Tianyu Wang
- Luhao Zhang
- Rachel Cummings
Paper Information
- arXiv ID: 2605.03945v1
- Categories: cs.LG, stat.ML
- Published: May 5, 2026
- PDF: Download PDF