[Paper] Code Fingerprints: Disentangled Attribution of LLM-Generated Code
Source: arXiv - 2603.04212v1
Overview
Large Language Models (LLMs) are now writing a huge chunk of production code, but knowing which model produced a snippet is becoming a real operational need—think security triage, licensing checks, or post‑mortem investigations. The paper “Code Fingerprints: Disentangled Attribution of LLM‑Generated Code” tackles this problem head‑on by introducing a way to attribute generated code back to the specific LLM that created it, opening the door to model‑level provenance tracking in everyday software engineering.
Key Contributions
- Model‑level code attribution: First systematic study of identifying the exact LLM (e.g., ChatGPT, Claude) that generated a piece of code.
- Disentangled Code Attribution Network (DCAN): A neural architecture that separates semantic (what the code does) from stylistic (how the code looks) signals, enabling reliable fingerprinting.
- Contrastive learning objective: Trains DCAN to amplify model‑specific quirks while suppressing task‑related content, making the fingerprint robust across programming tasks.
- Large‑scale benchmark: Curated dataset of > 100 k code snippets generated by four popular LLMs (DeepSeek, Claude, Qwen, ChatGPT) in Python, Java, C, and Go.
- Open‑source release: Code, data, and pretrained models are publicly available, encouraging reproducibility and downstream tooling.
Methodology
- Data collection – The authors prompted each LLM with a common set of programming problems (e.g., implement a linked list, solve a sorting task) across four languages, capturing the raw output.
- Feature disentanglement – DCAN consists of two parallel encoders:
- Semantic encoder (trained to preserve functional intent, e.g., using execution traces or AST semantics).
- Stylistic encoder (focuses on token‑level patterns, indentation, naming conventions, and subtle token‑distribution quirks).
- Contrastive loss – During training, pairs of code snippets from the same model are pulled together in the stylistic embedding space, while snippets from different models are pushed apart. The semantic encoder is regularized to stay invariant to the model source.
- Multi‑class classifier – The final attribution head reads the disentangled stylistic vector and predicts one of the candidate LLMs.
- Evaluation protocol – Accuracy, macro‑F1, and confusion matrices are reported per language and across mixed‑language batches to assess robustness.
Results & Findings
| Metric (overall) | Accuracy | Macro‑F1 |
|---|---|---|
| DCAN (4‑class) | 84.7 % | 0.83 |
| Baseline (raw fine‑tuned BERT) | 62.1 % | 0.60 |
| Baseline (n‑gram fingerprint) | 48.3 % | 0.45 |
- Cross‑language stability: Accuracy only dropped ~3 % when the model was evaluated on a language it hadn’t seen during training, showing that stylistic cues transfer across languages.
- Robustness to prompt variation: Even when the same problem was phrased differently, DCAN maintained > 80 % attribution, indicating that the fingerprints are not just prompt‑specific.
- Ablation: Removing the contrastive loss reduced performance by ~12 %, confirming its importance for isolating model‑specific style.
These numbers demonstrate that LLMs leave detectable “code fingerprints” that can be reliably extracted without executing the code.
Practical Implications
- Security & vulnerability triage – When a newly discovered flaw appears in a codebase, DCAN can quickly point to the originating LLM, helping teams assess whether the issue is a model‑wide bug or an isolated prompt mishap.
- License compliance – Companies can audit large code repositories to ensure that proprietary or restricted‑license models (e.g., a paid internal LLM) are not inadvertently leaking code into open‑source projects.
- Incident forensics – In post‑mortem analyses, knowing the exact model that generated a failing component can guide targeted model updates or prompt‑engineering fixes.
- Tooling integration – DCAN can be wrapped as a lightweight service (REST API or VS Code extension) that tags each generated snippet with a provenance header, enabling CI pipelines to enforce policy rules (e.g., “no code from unapproved models”).
- Model benchmarking – Developers can compare stylistic “noise” across LLMs to choose the one that best matches an organization’s coding standards or readability goals.
Limitations & Future Work
- Model coverage – The study only evaluates four LLMs; attribution accuracy may degrade with newer or heavily fine‑tuned models that share training data.
- Prompt leakage – Extremely short or highly templated snippets (e.g., one‑line getters) provide limited stylistic signal, leading to ambiguous attribution.
- Adversarial evasion – A determined user could post‑process generated code (reformatting, renaming) to obscure fingerprints; future work could explore robust, transformation‑invariant embeddings.
- Scalability to dozens of models – Extending DCAN to a large pool of models may require hierarchical or metric‑learning approaches to keep inference fast.
The authors suggest expanding the benchmark to more languages, incorporating open‑source LLMs, and investigating defenses against intentional fingerprint masking.
If you’re interested in trying out DCAN yourself, the authors have published the dataset and code on GitHub (https://github.com/mtt500/DCAN). Plug it into your CI pipeline, and you’ll start seeing model provenance tags appear alongside every auto‑generated pull request.
Authors
- Jiaxun Guo
- Ziyuan Yang
- Mengyu Sun
- Hui Wang
- Jingfeng Lu
- Yi Zhang
Paper Information
- arXiv ID: 2603.04212v1
- Categories: cs.SE, cs.CL
- Published: March 4, 2026
- PDF: Download PDF