[Paper] Exposing and Defending Membership Leakage in Vulnerability Prediction Models
Source: arXiv - 2512.08291v1
Overview
The paper investigates a hidden privacy risk in machine‑learning models that predict software vulnerabilities: membership inference attacks (MIAs) that can reveal whether a particular piece of code was part of the model’s training set. By systematically evaluating several popular neural architectures (LSTM, BiGRU, CodeBERT) and different output signals, the authors show that these models can leak sensitive information. They also introduce a lightweight defense—Noise‑based Membership Inference Defense (NMID)—that dramatically cuts attack success while keeping prediction accuracy intact.
Key Contributions
- First comprehensive MIA study for vulnerability‑prediction (VP) models, covering multiple neural architectures and feature combinations.
- Empirical evidence that logits and loss values are the most exploitable signals for membership leakage in code‑analysis tasks.
- Design of NMID, a simple output‑masking and Gaussian‑noise injection module that can be plugged into any VP model.
- Extensive evaluation demonstrating that NMID drops the attack AUC from ~1.0 to < 0.65 with negligible impact on the model’s vulnerability‑detection performance.
- Threat‑model articulation for realistic black‑box and gray‑box scenarios where only prediction outputs are observable.
Methodology
- Threat Model – The attacker can query a deployed VP model and observe its outputs (e.g., predicted probabilities, logits, loss). No internal weights are required (black‑box), but the attacker may also know the model architecture (gray‑box).
- Target Models – Three representative neural VP models were trained on large open‑source repositories:
- LSTM‑based sequence model
- BiGRU‑based sequence model
- CodeBERT (a transformer pre‑trained on source code)
- Attack Features – For each query the attacker extracts one or more of the following: raw logits, softmax confidence, loss value, and embedding vectors.
- Membership Inference – A binary classifier (typically a shallow MLP) is trained on a shadow dataset to distinguish “member” vs. “non‑member” samples using the extracted features.
- Defense (NMID) – Before returning outputs, the VP model passes them through NMID, which:
- Masks the most sensitive dimensions of the output vector, and
- Adds calibrated Gaussian noise (σ tuned to preserve utility).
- Evaluation Metrics – Attack success is measured by Area Under the ROC Curve (AUC). Model utility is measured by standard VP metrics (Precision, Recall, F1‑score).
Results & Findings
| Model | Feature Used | Attack AUC (no defense) |
|---|---|---|
| LSTM | Logits | 0.98 |
| BiGRU | Loss | 0.97 |
| CodeBERT | Logits | 0.99 |
- Logits and loss consistently yielded the highest AUC, confirming they leak the most membership information.
- Embedding‑only attacks performed poorly (AUC ≈ 0.55), indicating raw representations are less exploitable.
- NMID Effectiveness – After applying NMID (σ = 0.2), attack AUC dropped to 0.62–0.66 across all models, while VP performance degraded by < 2 % in F1‑score.
- Utility‑Privacy Trade‑off – Increasing noise further reduces AUC but begins to hurt detection accuracy; the authors identify a sweet spot around σ = 0.2–0.3 for most settings.
Practical Implications
- Security‑tool vendors should treat model outputs (especially logits and loss) as potentially sensitive and consider masking or noise injection before exposing them via APIs.
- CI/CD pipelines that automatically run VP models on proprietary code can now do so with reduced risk of leaking which code fragments were used for training.
- Compliance – Organizations handling regulated code (e.g., medical device firmware) can leverage NMID to meet privacy‑by‑design requirements without sacrificing defect‑detection capabilities.
- Open‑source model sharing – When publishing pretrained VP models, developers can ship NMID‑enabled checkpoints, giving downstream users a ready‑made privacy safeguard.
- Generalization – The approach is lightweight enough to be applied to other code‑analysis tasks (e.g., code clone detection, defect prediction), where similar leakage patterns are likely.
Limitations & Future Work
- Dataset Scope – Experiments were limited to a few large open‑source repositories; results may differ on highly proprietary or domain‑specific codebases.
- Attack Sophistication – The study focuses on standard shadow‑model attacks; adaptive adversaries that fine‑tune their classifiers or exploit side‑channel information were not explored.
- Noise Calibration – NMID relies on manually selecting the Gaussian noise scale; automated privacy budgeting (e.g., differential privacy) could provide stronger guarantees.
- Model Types – Only three neural architectures were examined; future work should assess newer graph‑based or hybrid models that may exhibit different leakage characteristics.
Overall, the paper shines a light on an overlooked privacy vector in AI‑driven software security and offers a pragmatic, low‑overhead defense that developers can adopt today.
Authors
- Yihan Liao
- Jacky Keung
- Xiaoxue Ma
- Jingyu Zhang
- Yicheng Sun
Paper Information
- arXiv ID: 2512.08291v1
- Categories: cs.CR, cs.SE
- Published: December 9, 2025
- PDF: Download PDF