[Paper] Understanding Privacy Risks in Code Models Through Training Dynamics: A Causal Approach
Source: arXiv - 2512.07814v1
Overview
Large language models for code (LLM4Code) are becoming indispensable tools for developers, but they also inherit the privacy risks of the massive open‑source codebases they are trained on. This paper digs into why certain kinds of personally identifiable information (PII) are more likely to be memorized and later reproduced by these models, and it does so with a causal lens that goes beyond treating PII as a monolithic block.
Key Contributions
- Fine‑grained PII taxonomy: Built a curated dataset covering multiple PII categories (IP addresses, email addresses, API keys, passwords, etc.) rather than a single “PII” label.
- Training‑dynamics analysis: Measured how quickly and how confidently models learn each PII instance during fine‑tuning, using per‑example loss and gradient statistics.
- Structural causal model (SCM): Formulated an SCM that links learnability (as captured by training dynamics) to leakage (the model’s propensity to reproduce the PII).
- Empirical causal evidence: Demonstrated that the causal effect of learnability on leakage varies dramatically across PII types—e.g., IP addresses have a strong positive effect, while cryptographic keys show a weak or negligible effect.
- Guidelines for defenses: Provided actionable insights for designing type‑aware mitigation strategies (e.g., selective data sanitization, learnability‑aware regularization).
Methodology
- Dataset construction – The authors mined public GitHub repositories and extracted real‑world PII instances, labeling them into distinct categories (network identifiers, credentials, personal contacts, etc.).
- Model fine‑tuning – Two representative LLM4Code families (a 350 M‑parameter and a 2.7 B‑parameter model) were fine‑tuned on the same code corpus that includes the PII dataset.
- Training dynamics extraction – For every PII example, the team recorded per‑step loss, gradient norm, and prediction confidence during training. These signals serve as proxies for “how easy” the model finds the example to learn.
- Leakage probing – After training, the models were prompted with code contexts that could trigger memorization. The presence of exact PII strings in the generated output was counted as a leak.
- Causal analysis – Using the extracted dynamics as an intermediate variable, a structural causal model was built to estimate the average treatment effect of learnability on leakage for each PII type, controlling for confounders like token frequency and length.
Results & Findings
| PII Type | Learnability (average loss drop) | Leakage Rate (post‑training) | Causal Effect |
|---|---|---|---|
| IP address | High (fast loss reduction) | ≈ 22 % of instances leaked | Strong positive |
| Email address | Medium | ≈ 12 % leaked | Moderate |
| API key | Low‑medium | ≈ 5 % leaked | Weak |
| Password / secret key | Low (slow learning) | ≈ 1 % leaked | Negligible |
| Ambiguous identifiers (e.g., usernames) | Mixed | Varied 4‑15 % | Inconsistent |
Key takeaways:
- Learnability predicts leakage. Instances that the model quickly fits (low loss, high confidence) are far more likely to be reproduced verbatim.
- Scale matters, but not uniformly. The larger 2.7 B model shows higher overall leakage, yet the relative ordering across PII types stays the same.
- Ambiguity introduces noise. When a token can appear both as PII and as a benign identifier, the causal link weakens, leading to mixed leakage behavior.
Practical Implications
- Targeted data sanitization: Instead of blanket removal of all PII, developers can prioritize scrubbing high‑learnability items (e.g., IPs, emails) that pose the greatest leakage risk.
- Learnability‑aware regularization: Training pipelines could incorporate dynamic loss weighting that penalizes rapid memorization of sensitive tokens, reducing their causal impact on leakage.
- Model‑level monitoring: By tracking training‑dynamics metrics in real time, teams can flag “hot” PII examples that are being memorized and intervene before deployment.
- Policy & compliance tooling: The causal framework offers a quantitative basis for compliance reports (e.g., GDPR) by showing which data categories are most vulnerable to accidental exposure.
- Design of safer code assistants: Product teams can embed type‑specific red‑action rules (e.g., mask IPs in completions) without degrading overall code suggestion quality.
Limitations & Future Work
- Dataset scope: The study relies on publicly available GitHub data; private repositories or non‑English codebases may exhibit different dynamics.
- Model diversity: Only two model sizes from a single architecture family were examined; transformer variants, retrieval‑augmented models, or instruction‑tuned LLMs could behave differently.
- Causal assumptions: The SCM treats training dynamics as the sole mediator; other latent factors (e.g., data duplication, tokenization quirks) might also influence leakage.
- Defensive evaluation: While the paper proposes type‑aware defenses, it does not empirically test their effectiveness in a production setting.
Future research directions include expanding the taxonomy to cover emerging PII (e.g., OAuth tokens), applying the causal analysis to multimodal code models, and building automated tooling that integrates learnability monitoring into CI/CD pipelines.
Authors
- Hua Yang
- Alejandro Velasco
- Sen Fang
- Bowen Xu
- Denys Poshyvanyk
Paper Information
- arXiv ID: 2512.07814v1
- Categories: cs.SE, cs.AI, cs.CR
- Published: December 8, 2025
- PDF: Download PDF