[Paper] Exposing and Defending Membership Leakage in Vulnerability Prediction Models

Published: 2 months ago (December 9, 2025 at 01:40 AM EST)

4 min read

Source: arXiv

Source: arXiv - 2512.08291v1

Overview

The paper investigates a hidden privacy risk in machine‑learning models that predict software vulnerabilities: membership inference attacks (MIAs) that can reveal whether a particular piece of code was part of the model’s training set. By systematically evaluating several popular neural architectures (LSTM, BiGRU, CodeBERT) and different output signals, the authors show that these models can leak sensitive information. They also introduce a lightweight defense—Noise‑based Membership Inference Defense (NMID)—that dramatically cuts attack success while keeping prediction accuracy intact.

Key Contributions

First comprehensive MIA study for vulnerability‑prediction (VP) models, covering multiple neural architectures and feature combinations.
Empirical evidence that logits and loss values are the most exploitable signals for membership leakage in code‑analysis tasks.
Design of NMID, a simple output‑masking and Gaussian‑noise injection module that can be plugged into any VP model.
Extensive evaluation demonstrating that NMID drops the attack AUC from ~1.0 to < 0.65 with negligible impact on the model’s vulnerability‑detection performance.
Threat‑model articulation for realistic black‑box and gray‑box scenarios where only prediction outputs are observable.

Methodology

Threat Model – The attacker can query a deployed VP model and observe its outputs (e.g., predicted probabilities, logits, loss). No internal weights are required (black‑box), but the attacker may also know the model architecture (gray‑box).
Target Models – Three representative neural VP models were trained on large open‑source repositories:
- LSTM‑based sequence model
- BiGRU‑based sequence model
- CodeBERT (a transformer pre‑trained on source code)
Attack Features – For each query the attacker extracts one or more of the following: raw logits, softmax confidence, loss value, and embedding vectors.
Membership Inference – A binary classifier (typically a shallow MLP) is trained on a shadow dataset to distinguish “member” vs. “non‑member” samples using the extracted features.
Defense (NMID) – Before returning outputs, the VP model passes them through NMID, which:
- Masks the most sensitive dimensions of the output vector, and
- Adds calibrated Gaussian noise (σ tuned to preserve utility).
Evaluation Metrics – Attack success is measured by Area Under the ROC Curve (AUC). Model utility is measured by standard VP metrics (Precision, Recall, F1‑score).

Results & Findings

Model	Feature Used	Attack AUC (no defense)
LSTM	Logits	0.98
BiGRU	Loss	0.97
CodeBERT	Logits	0.99

Logits and loss consistently yielded the highest AUC, confirming they leak the most membership information.
Embedding‑only attacks performed poorly (AUC ≈ 0.55), indicating raw representations are less exploitable.
NMID Effectiveness – After applying NMID (σ = 0.2), attack AUC dropped to 0.62–0.66 across all models, while VP performance degraded by < 2 % in F1‑score.
Utility‑Privacy Trade‑off – Increasing noise further reduces AUC but begins to hurt detection accuracy; the authors identify a sweet spot around σ = 0.2–0.3 for most settings.

Practical Implications

Security‑tool vendors should treat model outputs (especially logits and loss) as potentially sensitive and consider masking or noise injection before exposing them via APIs.
CI/CD pipelines that automatically run VP models on proprietary code can now do so with reduced risk of leaking which code fragments were used for training.
Compliance – Organizations handling regulated code (e.g., medical device firmware) can leverage NMID to meet privacy‑by‑design requirements without sacrificing defect‑detection capabilities.
Open‑source model sharing – When publishing pretrained VP models, developers can ship NMID‑enabled checkpoints, giving downstream users a ready‑made privacy safeguard.
Generalization – The approach is lightweight enough to be applied to other code‑analysis tasks (e.g., code clone detection, defect prediction), where similar leakage patterns are likely.

Limitations & Future Work

Dataset Scope – Experiments were limited to a few large open‑source repositories; results may differ on highly proprietary or domain‑specific codebases.
Attack Sophistication – The study focuses on standard shadow‑model attacks; adaptive adversaries that fine‑tune their classifiers or exploit side‑channel information were not explored.
Noise Calibration – NMID relies on manually selecting the Gaussian noise scale; automated privacy budgeting (e.g., differential privacy) could provide stronger guarantees.
Model Types – Only three neural architectures were examined; future work should assess newer graph‑based or hybrid models that may exhibit different leakage characteristics.

Overall, the paper shines a light on an overlooked privacy vector in AI‑driven software security and offers a pragmatic, low‑overhead defense that developers can adopt today.

Authors

Yihan Liao
Jacky Keung
Xiaoxue Ma
Jingyu Zhang
Yicheng Sun

Paper Information

arXiv ID: 2512.08291v1
Categories: cs.CR, cs.SE
Published: December 9, 2025
PDF: Download PDF

[Paper] Exposing and Defending Membership Leakage in Vulnerability Prediction Models

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] A Study of Library Usage in Agent-Authored Pull Requests

[Paper] Mini-SFC: A Comprehensive Simulation Framework for Orchestration and Management of Service Function Chains

[Paper] AutoFSM: A Multi-agent Framework for FSM Code Generation with IR and SystemC-Based Testing

[Paper] Visualisation for the CIS benchmark scanning results