[Paper] LLMSniffer: Detecting LLM-Generated Code via GraphCodeBERT and Supervised Contrastive Learning
Source: arXiv - 2604.16058v1
Overview
LLMSniffer tackles a timely problem: as Large Language Models (LLMs) like GPT‑4 start writing production‑grade code, developers, educators, and security teams need reliable ways to tell whether a snippet was crafted by a human or generated by an AI. The authors propose a detection framework that fine‑tunes the code‑aware transformer GraphCodeBERT with a two‑stage supervised contrastive learning pipeline, delivering a noticeable boost in accuracy on two established benchmarks.
Key Contributions
- Contrastive fine‑tuning for code detection – Introduces a two‑stage supervised contrastive learning scheme that sharpens the embedding space of GraphCodeBERT, making AI‑generated and human‑written code easily separable.
- Comment‑removal preprocessing – Strips comments before training, preventing models from latching onto superficial cues (e.g., “generated by …”) and forcing deeper semantic learning.
- MLP classifier on top of refined embeddings – A lightweight multilayer perceptron translates the contrastively learned representations into a binary decision (LLM‑generated vs. human).
- State‑of‑the‑art results – Improves accuracy from 70 % to 78 % (F1 from 68 % to 78 %) on the GPTSniffer dataset and from 91 % to 94.65 % (F1 from 91 % to 94.64 %) on the Whodunit dataset.
- Open resources – Releases model checkpoints, preprocessing scripts, datasets, and an interactive demo to accelerate reproducibility and downstream research.
Methodology
- Data preparation – The authors start with two public corpora (GPTSniffer, Whodunit) that contain paired human‑written and LLM‑generated code snippets. All comments are stripped to avoid trivial detection signals.
- Base encoder – GraphCodeBERT, a transformer pre‑trained on source‑code graphs (AST‑augmented), serves as the backbone because it already captures syntactic and data‑flow information.
- Two‑stage supervised contrastive learning
- Stage 1: The encoder is fine‑tuned with a supervised contrastive loss that pushes embeddings of the same class (human or AI) together while pulling opposite‑class embeddings apart.
- Stage 2: The same encoder is further refined with a standard cross‑entropy loss, now using the more discriminative embedding space.
- Classification head – A simple MLP (two hidden layers) reads the final embeddings and outputs a binary label.
- Evaluation – Accuracy, F1‑score, and t‑SNE visualizations are used to assess both quantitative performance and the quality of the learned embedding clusters.
Results & Findings
| Dataset | Baseline Acc. / F1 | LLMSniffer Acc. / F1 |
|---|---|---|
| GPTSniffer | 70 % / 68 % | 78 % / 78 % |
| Whodunit | 91 % / 91 % | 94.65 % / 94.64 % |
- Embedding separation: t‑SNE plots show tight, well‑separated clusters for human vs. AI code after contrastive fine‑tuning, confirming that the loss function effectively structures the latent space.
- Robustness to comment removal: Even without comment cues, the model retains high performance, indicating it learns deeper code semantics rather than surface patterns.
- Efficiency: The added contrastive stage adds modest training overhead but yields a sizable accuracy jump, making it practical for real‑world pipelines.
Practical Implications
- Code review tooling – Integrate LLMSniffer as a pre‑commit hook or CI step to flag AI‑generated snippets, helping teams enforce coding standards or detect potential plagiarism.
- Academic integrity – Universities can automatically screen student submissions for LLM‑generated code, preserving fairness in programming assignments.
- Security auditing – AI‑generated code may carry subtle vulnerabilities; early detection enables security teams to apply targeted static analysis.
- Licensing compliance – Companies can monitor for inadvertent inclusion of code that may be subject to different usage licenses when generated by proprietary LLMs.
- Research foundation – The released checkpoints and demo provide a baseline for future work on multimodal detection (e.g., combining code with commit messages) or extending to other programming languages.
Limitations & Future Work
- Language coverage – Experiments focus on a limited set of languages (primarily Python/Java). Extending to less‑common languages may require additional pre‑training.
- Evolving LLMs – As LLMs improve, detection becomes a moving target; the authors note the need for continual re‑training with fresh AI‑generated samples.
- Adversarial evasion – Simple obfuscations (renaming variables, reformatting) could degrade detection; future work could explore robustness against such attacks.
- Explainability – While embeddings separate well, the model does not currently provide human‑readable reasons for its decisions; adding interpretability layers would aid trust in production settings.
Authors
- Mahir Labib Dihan
- Abir Muhtasim
Paper Information
- arXiv ID: 2604.16058v1
- Categories: cs.SE, cs.CL
- Published: April 17, 2026
- PDF: Download PDF