[Paper] Understanding Privacy Risks in Code Models Through Training Dynamics: A Causal Approach

Published: 1 week ago (December 8, 2025 at 01:47 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2512.07814v1

Overview

Large language models for code (LLM4Code) are becoming indispensable tools for developers, but they also inherit the privacy risks of the massive open‑source codebases they are trained on. This paper digs into why certain kinds of personally identifiable information (PII) are more likely to be memorized and later reproduced by these models, and it does so with a causal lens that goes beyond treating PII as a monolithic block.

Key Contributions

Fine‑grained PII taxonomy: Built a curated dataset covering multiple PII categories (IP addresses, email addresses, API keys, passwords, etc.) rather than a single “PII” label.
Training‑dynamics analysis: Measured how quickly and how confidently models learn each PII instance during fine‑tuning, using per‑example loss and gradient statistics.
Structural causal model (SCM): Formulated an SCM that links learnability (as captured by training dynamics) to leakage (the model’s propensity to reproduce the PII).
Empirical causal evidence: Demonstrated that the causal effect of learnability on leakage varies dramatically across PII types—e.g., IP addresses have a strong positive effect, while cryptographic keys show a weak or negligible effect.
Guidelines for defenses: Provided actionable insights for designing type‑aware mitigation strategies (e.g., selective data sanitization, learnability‑aware regularization).

Methodology

Dataset construction – The authors mined public GitHub repositories and extracted real‑world PII instances, labeling them into distinct categories (network identifiers, credentials, personal contacts, etc.).
Model fine‑tuning – Two representative LLM4Code families (a 350 M‑parameter and a 2.7 B‑parameter model) were fine‑tuned on the same code corpus that includes the PII dataset.
Training dynamics extraction – For every PII example, the team recorded per‑step loss, gradient norm, and prediction confidence during training. These signals serve as proxies for “how easy” the model finds the example to learn.
Leakage probing – After training, the models were prompted with code contexts that could trigger memorization. The presence of exact PII strings in the generated output was counted as a leak.
Causal analysis – Using the extracted dynamics as an intermediate variable, a structural causal model was built to estimate the average treatment effect of learnability on leakage for each PII type, controlling for confounders like token frequency and length.

Results & Findings

PII Type	Learnability (average loss drop)	Leakage Rate (post‑training)	Causal Effect
IP address	High (fast loss reduction)	≈ 22 % of instances leaked	Strong positive
Email address	Medium	≈ 12 % leaked	Moderate
API key	Low‑medium	≈ 5 % leaked	Weak
Password / secret key	Low (slow learning)	≈ 1 % leaked	Negligible
Ambiguous identifiers (e.g., usernames)	Mixed	Varied 4‑15 %	Inconsistent

Key takeaways:

Learnability predicts leakage. Instances that the model quickly fits (low loss, high confidence) are far more likely to be reproduced verbatim.
Scale matters, but not uniformly. The larger 2.7 B model shows higher overall leakage, yet the relative ordering across PII types stays the same.
Ambiguity introduces noise. When a token can appear both as PII and as a benign identifier, the causal link weakens, leading to mixed leakage behavior.

Practical Implications

Targeted data sanitization: Instead of blanket removal of all PII, developers can prioritize scrubbing high‑learnability items (e.g., IPs, emails) that pose the greatest leakage risk.
Learnability‑aware regularization: Training pipelines could incorporate dynamic loss weighting that penalizes rapid memorization of sensitive tokens, reducing their causal impact on leakage.
Model‑level monitoring: By tracking training‑dynamics metrics in real time, teams can flag “hot” PII examples that are being memorized and intervene before deployment.
Policy & compliance tooling: The causal framework offers a quantitative basis for compliance reports (e.g., GDPR) by showing which data categories are most vulnerable to accidental exposure.
Design of safer code assistants: Product teams can embed type‑specific red‑action rules (e.g., mask IPs in completions) without degrading overall code suggestion quality.

Limitations & Future Work

Dataset scope: The study relies on publicly available GitHub data; private repositories or non‑English codebases may exhibit different dynamics.
Model diversity: Only two model sizes from a single architecture family were examined; transformer variants, retrieval‑augmented models, or instruction‑tuned LLMs could behave differently.
Causal assumptions: The SCM treats training dynamics as the sole mediator; other latent factors (e.g., data duplication, tokenization quirks) might also influence leakage.
Defensive evaluation: While the paper proposes type‑aware defenses, it does not empirically test their effectiveness in a production setting.

Future research directions include expanding the taxonomy to cover emerging PII (e.g., OAuth tokens), applying the causal analysis to multimodal code models, and building automated tooling that integrates learnability monitoring into CI/CD pipelines.

Authors

Hua Yang
Alejandro Velasco
Sen Fang
Bowen Xu
Denys Poshyvanyk

Paper Information

arXiv ID: 2512.07814v1
Categories: cs.SE, cs.AI, cs.CR
Published: December 8, 2025
PDF: Download PDF

[Paper] Understanding Privacy Risks in Code Models Through Training Dynamics: A Causal Approach

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] DiffusionBrowser: Interactive Diffusion Previews via Multi-Branch Decoders

[Paper] Feedforward 3D Editing via Text-Steerable Image-to-3D

[Paper] Directional Textual Inversion for Personalized Text-to-Image Generation

[Paper] A Scientific Reasoning Model for Organic Synthesis Procedure Generation