[Paper] LLMSniffer: Detecting LLM-Generated Code via GraphCodeBERT and Supervised Contrastive Learning

Published: (April 17, 2026 at 09:32 AM EDT)
4 min read
Source: arXiv

Source: arXiv - 2604.16058v1

Overview

LLMSniffer tackles a timely problem: as Large Language Models (LLMs) like GPT‑4 start writing production‑grade code, developers, educators, and security teams need reliable ways to tell whether a snippet was crafted by a human or generated by an AI. The authors propose a detection framework that fine‑tunes the code‑aware transformer GraphCodeBERT with a two‑stage supervised contrastive learning pipeline, delivering a noticeable boost in accuracy on two established benchmarks.

Key Contributions

  • Contrastive fine‑tuning for code detection – Introduces a two‑stage supervised contrastive learning scheme that sharpens the embedding space of GraphCodeBERT, making AI‑generated and human‑written code easily separable.
  • Comment‑removal preprocessing – Strips comments before training, preventing models from latching onto superficial cues (e.g., “generated by …”) and forcing deeper semantic learning.
  • MLP classifier on top of refined embeddings – A lightweight multilayer perceptron translates the contrastively learned representations into a binary decision (LLM‑generated vs. human).
  • State‑of‑the‑art results – Improves accuracy from 70 % to 78 % (F1 from 68 % to 78 %) on the GPTSniffer dataset and from 91 % to 94.65 % (F1 from 91 % to 94.64 %) on the Whodunit dataset.
  • Open resources – Releases model checkpoints, preprocessing scripts, datasets, and an interactive demo to accelerate reproducibility and downstream research.

Methodology

  1. Data preparation – The authors start with two public corpora (GPTSniffer, Whodunit) that contain paired human‑written and LLM‑generated code snippets. All comments are stripped to avoid trivial detection signals.
  2. Base encoder – GraphCodeBERT, a transformer pre‑trained on source‑code graphs (AST‑augmented), serves as the backbone because it already captures syntactic and data‑flow information.
  3. Two‑stage supervised contrastive learning
    • Stage 1: The encoder is fine‑tuned with a supervised contrastive loss that pushes embeddings of the same class (human or AI) together while pulling opposite‑class embeddings apart.
    • Stage 2: The same encoder is further refined with a standard cross‑entropy loss, now using the more discriminative embedding space.
  4. Classification head – A simple MLP (two hidden layers) reads the final embeddings and outputs a binary label.
  5. Evaluation – Accuracy, F1‑score, and t‑SNE visualizations are used to assess both quantitative performance and the quality of the learned embedding clusters.

Results & Findings

DatasetBaseline Acc. / F1LLMSniffer Acc. / F1
GPTSniffer70 % / 68 %78 % / 78 %
Whodunit91 % / 91 %94.65 % / 94.64 %
  • Embedding separation: t‑SNE plots show tight, well‑separated clusters for human vs. AI code after contrastive fine‑tuning, confirming that the loss function effectively structures the latent space.
  • Robustness to comment removal: Even without comment cues, the model retains high performance, indicating it learns deeper code semantics rather than surface patterns.
  • Efficiency: The added contrastive stage adds modest training overhead but yields a sizable accuracy jump, making it practical for real‑world pipelines.

Practical Implications

  • Code review tooling – Integrate LLMSniffer as a pre‑commit hook or CI step to flag AI‑generated snippets, helping teams enforce coding standards or detect potential plagiarism.
  • Academic integrity – Universities can automatically screen student submissions for LLM‑generated code, preserving fairness in programming assignments.
  • Security auditing – AI‑generated code may carry subtle vulnerabilities; early detection enables security teams to apply targeted static analysis.
  • Licensing compliance – Companies can monitor for inadvertent inclusion of code that may be subject to different usage licenses when generated by proprietary LLMs.
  • Research foundation – The released checkpoints and demo provide a baseline for future work on multimodal detection (e.g., combining code with commit messages) or extending to other programming languages.

Limitations & Future Work

  • Language coverage – Experiments focus on a limited set of languages (primarily Python/Java). Extending to less‑common languages may require additional pre‑training.
  • Evolving LLMs – As LLMs improve, detection becomes a moving target; the authors note the need for continual re‑training with fresh AI‑generated samples.
  • Adversarial evasion – Simple obfuscations (renaming variables, reformatting) could degrade detection; future work could explore robustness against such attacks.
  • Explainability – While embeddings separate well, the model does not currently provide human‑readable reasons for its decisions; adding interpretability layers would aid trust in production settings.

Authors

  • Mahir Labib Dihan
  • Abir Muhtasim

Paper Information

  • arXiv ID: 2604.16058v1
  • Categories: cs.SE, cs.CL
  • Published: April 17, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »