[Paper] No Memorization, No Detection: Output Distribution-Based Contamination Detection in Small Language Models
Source: arXiv - 2603.03203v1
Overview
This paper investigates Contamination Detection via output Distribution (CDD)—a technique that flags whether a language model has been trained on a particular dataset by looking at how “peaked” its sampled outputs are. By running controlled experiments on small models (70 M–410 M parameters) and popular benchmark suites (GSM8K, HumanEval, MATH), the author shows that CDD only works when the fine‑tuning process actually memorizes the contaminated examples. Parameter‑efficient fine‑tuning (e.g., low‑rank adapters) can absorb the data without memorizing it, rendering CDD ineffective.
Key Contributions
- Empirical characterization of CDD’s success/failure regimes on small language models.
- Controlled contamination experiments on three widely used evaluation sets, enabling precise measurement of detection accuracy.
- Discovery of a “memorization threshold”: CDD works only when fine‑tuning capacity is high enough to cause verbatim memorization.
- Demonstration that parameter‑efficient fine‑tuning (low‑rank adaptation) can hide contamination from output‑distribution methods.
- Open‑source implementation and reproducible scripts (GitHub link provided).
Methodology
- Model selection – Six transformer models ranging from 70 M to 410 M parameters were fine‑tuned on target tasks.
- Contamination injection – For each benchmark (GSM8K, HumanEval, MATH), a known subset of examples was deliberately added to the fine‑tuning data.
- Fine‑tuning strategies – Two approaches were compared:
- Full‑parameter fine‑tuning (standard SGD updates to all weights).
- Low‑rank adaptation (LoRA) – a parameter‑efficient method that adds small trainable matrices while freezing the base model.
- CDD measurement – After fine‑tuning, the model is prompted with the original test inputs and its output probability distribution is sampled many times. The peakedness (e.g., KL‑divergence from a uniform baseline) is used as the contamination signal.
- Evaluation – Detection accuracy is computed by treating each test example as “contaminated” or “clean” and measuring how often CDD correctly classifies them.
Results & Findings
| Fine‑tuning method | Model size | Detection accuracy (≈) |
|---|---|---|
| Full‑parameter | 70 M – 410 M | 70 % – 95 % (high when memorization occurs) |
| Low‑rank (LoRA) | 70 M – 410 M | ≈ 50 % (chance level) |
- Memorization is the key: When full‑parameter fine‑tuning leads to verbatim copies of contaminated examples in the model’s weights, CDD reliably spots them.
- Low‑rank adapters learn without memorizing: Even though the model’s performance on the contaminated task improves, the output distribution stays diffuse, causing CDD to miss the contamination entirely.
- Threshold effect: There is a clear transition point where increasing fine‑tuning capacity (more trainable parameters or more epochs) flips the model from “non‑memorizing” to “memorizing,” and CDD’s detection jumps from random to strong.
Practical Implications
- Data provenance audits: Organizations relying on output‑distribution checks to certify that a model has not been trained on proprietary data should be aware that parameter‑efficient fine‑tuning can bypass these checks.
- Model licensing & compliance: When using LoRA‑style adapters on top of a base model, you may inadvertently introduce copyrighted or sensitive data without any detectable footprint.
- Tooling for developers: The open‑source code can be integrated into CI pipelines to automatically test whether a new fine‑tuning run is likely to memorize its training set.
- Security & IP protection: Companies can design “defensive” fine‑tuning regimes (e.g., limit adapter rank, add regularization) to reduce the risk of accidental data leakage that is hard to detect post‑hoc.
Limitations & Future Work
- Scale: Experiments stop at 410 M parameters; it remains open whether the memorization threshold behaves similarly for larger models (e.g., 1 B+).
- Dataset diversity: Only three benchmark suites were examined; other domains (code, dialogue, multilingual text) might exhibit different memorization dynamics.
- Detection metrics: CDD relies on a single peakedness statistic; combining it with other signals (e.g., gradient‑based probing) could improve robustness.
- Adaptation strategies: The study focuses on LoRA; other parameter‑efficient methods (prefix‑tuning, adapters, IA³) deserve systematic evaluation.
The authors provide the full experimental pipeline on GitHub, making it easy for practitioners to reproduce and extend the analysis.
Authors
- Omer Sela
Paper Information
- arXiv ID: 2603.03203v1
- Categories: cs.AI, cs.CL
- Published: March 3, 2026
- PDF: Download PDF