[Paper] LLMSniffer: Detecting LLM-Generated Code via GraphCodeBERT and Supervised Contrastive Learning

Published: 3 weeks ago (April 17, 2026 at 09:32 AM EDT)

4 min read

Source: arXiv

Source: arXiv - 2604.16058v1

Overview

LLMSniffer tackles a timely problem: as Large Language Models (LLMs) like GPT‑4 start writing production‑grade code, developers, educators, and security teams need reliable ways to tell whether a snippet was crafted by a human or generated by an AI. The authors propose a detection framework that fine‑tunes the code‑aware transformer GraphCodeBERT with a two‑stage supervised contrastive learning pipeline, delivering a noticeable boost in accuracy on two established benchmarks.

Key Contributions

Contrastive fine‑tuning for code detection – Introduces a two‑stage supervised contrastive learning scheme that sharpens the embedding space of GraphCodeBERT, making AI‑generated and human‑written code easily separable.
Comment‑removal preprocessing – Strips comments before training, preventing models from latching onto superficial cues (e.g., “generated by …”) and forcing deeper semantic learning.
MLP classifier on top of refined embeddings – A lightweight multilayer perceptron translates the contrastively learned representations into a binary decision (LLM‑generated vs. human).
State‑of‑the‑art results – Improves accuracy from 70 % to 78 % (F1 from 68 % to 78 %) on the GPTSniffer dataset and from 91 % to 94.65 % (F1 from 91 % to 94.64 %) on the Whodunit dataset.
Open resources – Releases model checkpoints, preprocessing scripts, datasets, and an interactive demo to accelerate reproducibility and downstream research.

Methodology

Data preparation – The authors start with two public corpora (GPTSniffer, Whodunit) that contain paired human‑written and LLM‑generated code snippets. All comments are stripped to avoid trivial detection signals.
Base encoder – GraphCodeBERT, a transformer pre‑trained on source‑code graphs (AST‑augmented), serves as the backbone because it already captures syntactic and data‑flow information.
Two‑stage supervised contrastive learning
- Stage 1: The encoder is fine‑tuned with a supervised contrastive loss that pushes embeddings of the same class (human or AI) together while pulling opposite‑class embeddings apart.
- Stage 2: The same encoder is further refined with a standard cross‑entropy loss, now using the more discriminative embedding space.
Classification head – A simple MLP (two hidden layers) reads the final embeddings and outputs a binary label.
Evaluation – Accuracy, F1‑score, and t‑SNE visualizations are used to assess both quantitative performance and the quality of the learned embedding clusters.

Results & Findings

Dataset	Baseline Acc. / F1	LLMSniffer Acc. / F1
GPTSniffer	70 % / 68 %	78 % / 78 %
Whodunit	91 % / 91 %	94.65 % / 94.64 %

Embedding separation: t‑SNE plots show tight, well‑separated clusters for human vs. AI code after contrastive fine‑tuning, confirming that the loss function effectively structures the latent space.
Robustness to comment removal: Even without comment cues, the model retains high performance, indicating it learns deeper code semantics rather than surface patterns.
Efficiency: The added contrastive stage adds modest training overhead but yields a sizable accuracy jump, making it practical for real‑world pipelines.

Practical Implications

Code review tooling – Integrate LLMSniffer as a pre‑commit hook or CI step to flag AI‑generated snippets, helping teams enforce coding standards or detect potential plagiarism.
Academic integrity – Universities can automatically screen student submissions for LLM‑generated code, preserving fairness in programming assignments.
Security auditing – AI‑generated code may carry subtle vulnerabilities; early detection enables security teams to apply targeted static analysis.
Licensing compliance – Companies can monitor for inadvertent inclusion of code that may be subject to different usage licenses when generated by proprietary LLMs.
Research foundation – The released checkpoints and demo provide a baseline for future work on multimodal detection (e.g., combining code with commit messages) or extending to other programming languages.

Limitations & Future Work

Language coverage – Experiments focus on a limited set of languages (primarily Python/Java). Extending to less‑common languages may require additional pre‑training.
Evolving LLMs – As LLMs improve, detection becomes a moving target; the authors note the need for continual re‑training with fresh AI‑generated samples.
Adversarial evasion – Simple obfuscations (renaming variables, reformatting) could degrade detection; future work could explore robustness against such attacks.
Explainability – While embeddings separate well, the model does not currently provide human‑readable reasons for its decisions; adding interpretability layers would aid trust in production settings.

Authors

Mahir Labib Dihan
Abir Muhtasim

Paper Information

arXiv ID: 2604.16058v1
Categories: cs.SE, cs.CL
Published: April 17, 2026
PDF: Download PDF

[Paper] LLMSniffer: Detecting LLM-Generated Code via GraphCodeBERT and Supervised Contrastive Learning

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Learning to Reason with Insight for Informal Theorem Proving

[Paper] No Universal Courtesy: A Cross-Linguistic, Multi-Model Study of Politeness Effects on LLMs Using the PLUM Corpus

[Paper] VEFX-Bench: A Holistic Benchmark for Generic Video Editing and Visual Effects

[Paper] From Benchmarking to Reasoning: A Dual-Aspect, Large-Scale Evaluation of LLMs on Vietnamese Legal Text