[Paper] No Memorization, No Detection: Output Distribution-Based Contamination Detection in Small Language Models

Published: 3 days ago (March 3, 2026 at 12:55 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2603.03203v1

Overview

This paper investigates Contamination Detection via output Distribution (CDD)—a technique that flags whether a language model has been trained on a particular dataset by looking at how “peaked” its sampled outputs are. By running controlled experiments on small models (70 M–410 M parameters) and popular benchmark suites (GSM8K, HumanEval, MATH), the author shows that CDD only works when the fine‑tuning process actually memorizes the contaminated examples. Parameter‑efficient fine‑tuning (e.g., low‑rank adapters) can absorb the data without memorizing it, rendering CDD ineffective.

Key Contributions

Empirical characterization of CDD’s success/failure regimes on small language models.
Controlled contamination experiments on three widely used evaluation sets, enabling precise measurement of detection accuracy.
Discovery of a “memorization threshold”: CDD works only when fine‑tuning capacity is high enough to cause verbatim memorization.
Demonstration that parameter‑efficient fine‑tuning (low‑rank adaptation) can hide contamination from output‑distribution methods.
Open‑source implementation and reproducible scripts (GitHub link provided).

Methodology

Model selection – Six transformer models ranging from 70 M to 410 M parameters were fine‑tuned on target tasks.
Contamination injection – For each benchmark (GSM8K, HumanEval, MATH), a known subset of examples was deliberately added to the fine‑tuning data.
Fine‑tuning strategies – Two approaches were compared:
- Full‑parameter fine‑tuning (standard SGD updates to all weights).
- Low‑rank adaptation (LoRA) – a parameter‑efficient method that adds small trainable matrices while freezing the base model.
CDD measurement – After fine‑tuning, the model is prompted with the original test inputs and its output probability distribution is sampled many times. The peakedness (e.g., KL‑divergence from a uniform baseline) is used as the contamination signal.
Evaluation – Detection accuracy is computed by treating each test example as “contaminated” or “clean” and measuring how often CDD correctly classifies them.

Results & Findings

Fine‑tuning method	Model size	Detection accuracy (≈)
Full‑parameter	70 M – 410 M	70 % – 95 % (high when memorization occurs)
Low‑rank (LoRA)	70 M – 410 M	≈ 50 % (chance level)

Memorization is the key: When full‑parameter fine‑tuning leads to verbatim copies of contaminated examples in the model’s weights, CDD reliably spots them.
Low‑rank adapters learn without memorizing: Even though the model’s performance on the contaminated task improves, the output distribution stays diffuse, causing CDD to miss the contamination entirely.
Threshold effect: There is a clear transition point where increasing fine‑tuning capacity (more trainable parameters or more epochs) flips the model from “non‑memorizing” to “memorizing,” and CDD’s detection jumps from random to strong.

Practical Implications

Data provenance audits: Organizations relying on output‑distribution checks to certify that a model has not been trained on proprietary data should be aware that parameter‑efficient fine‑tuning can bypass these checks.
Model licensing & compliance: When using LoRA‑style adapters on top of a base model, you may inadvertently introduce copyrighted or sensitive data without any detectable footprint.
Tooling for developers: The open‑source code can be integrated into CI pipelines to automatically test whether a new fine‑tuning run is likely to memorize its training set.
Security & IP protection: Companies can design “defensive” fine‑tuning regimes (e.g., limit adapter rank, add regularization) to reduce the risk of accidental data leakage that is hard to detect post‑hoc.

Limitations & Future Work

Scale: Experiments stop at 410 M parameters; it remains open whether the memorization threshold behaves similarly for larger models (e.g., 1 B+).
Dataset diversity: Only three benchmark suites were examined; other domains (code, dialogue, multilingual text) might exhibit different memorization dynamics.
Detection metrics: CDD relies on a single peakedness statistic; combining it with other signals (e.g., gradient‑based probing) could improve robustness.
Adaptation strategies: The study focuses on LoRA; other parameter‑efficient methods (prefix‑tuning, adapters, IA³) deserve systematic evaluation.

The authors provide the full experimental pipeline on GitHub, making it easy for practitioners to reproduce and extend the analysis.

Authors

Omer Sela

Paper Information

arXiv ID: 2603.03203v1
Categories: cs.AI, cs.CL
Published: March 3, 2026
PDF: Download PDF

[Paper] No Memorization, No Detection: Output Distribution-Based Contamination Detection in Small Language Models

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] POET-X: Memory-efficient LLM Training by Scaling Orthogonal Transformation

[Paper] The Spike, the Sparse and the Sink: Anatomy of Massive Activations and Attention Sinks

[Paper] Censored LLMs as a Natural Testbed for Secret Knowledge Elicitation

[Paper] Reasoning Theater: Disentangling Model Beliefs from Chain-of-Thought