[Paper] Identifying Models Behind Text-to-Image Leaderboards
Source: arXiv - 2601.09647v1
Overview
The paper Identifying Models Behind Text‑to‑Image Leaderboards exposes a hidden privacy flaw in the way popular text‑to‑image (T2I) model leaderboards are run. While these leaderboards hide the model name to keep the competition fair, the authors demonstrate that the visual “fingerprint” of each model can be recovered automatically, effectively de‑anonymizing the submissions. This finding has immediate consequences for how we evaluate, share, and protect generative AI systems.
Key Contributions
- Model fingerprinting in image space: Shows that outputs from a given T2I model cluster tightly in a high‑dimensional embedding space, creating a distinctive signature.
- Simple, prompt‑agnostic deanonymization: Introduces a centroid‑based classifier that can identify the source model with >90 % accuracy across 22 models and 150 K generated images, without needing to know the prompts or training data.
- Prompt‑level distinguishability metric: Proposes a quantitative measure of how “identifiable” a prompt is, revealing that some prompts make models almost trivially separable.
- Large‑scale empirical analysis: Evaluates the method on a diverse set of models (diffusion, latent diffusion, GLIDE, etc.) and prompts, confirming the robustness of the fingerprinting effect.
- Security recommendations: Highlights the need for stronger anonymization techniques and suggests concrete defenses (e.g., adding noise, style‑transfer post‑processing).
Methodology
- Data collection: The authors generated 150 K images using 22 publicly available T2I models on a shared pool of 280 prompts (covering a wide range of subjects, styles, and complexities).
- Embedding extraction: Each image was passed through a pre‑trained CLIP vision encoder, producing a 512‑dimensional vector that captures semantic content while being relatively model‑agnostic.
- Centroid construction: For every model, the mean (centroid) of all its image embeddings was computed.
- Deanonymization classifier: A new image is assigned to the model whose centroid is closest in cosine distance. No additional training or prompt information is required.
- Prompt‑level analysis: The authors compute a distinguishability score for each prompt by measuring the separation between model clusters when that prompt is used.
- Evaluation: Accuracy, precision, and recall are reported across multiple splits, and ablation studies test the impact of embedding model, number of prompts, and image resolution.
Results & Findings
- High deanonymization accuracy: The centroid classifier correctly identified the source model for 92 % of test images (top‑1) and 98 % when allowing a top‑3 guess.
- Distinctive model signatures: Even models that share architecture or training data (e.g., two versions of Stable Diffusion) formed separable clusters, suggesting subtle implementation‑level differences (sampling schedule, tokenizer tweaks, etc.).
- Prompt influence: Certain prompts (e.g., “a photo of a red apple on a wooden table”) yielded near‑perfect distinguishability (>99 % accuracy), while others (abstract scenes) produced much lower scores.
- Robustness to transformations: Simple post‑processing (cropping, JPEG compression) reduced accuracy only modestly (down to ~85 %), indicating that the fingerprint persists through typical image‑hosting pipelines.
- Scalability: Adding more models only marginally decreased performance, implying the approach scales to larger leaderboards.
Practical Implications
- Leaderboard design: Organizers must rethink anonymity. Simple shuffling of outputs is insufficient; additional steps such as adding stochastic visual noise, applying style‑transfer, or using multiple “cover” models may be required.
- Model provenance tracking: The fingerprinting technique could be repurposed as a forensic tool to detect unauthorized reuse of proprietary T2I models in the wild.
- Competitive fairness: Developers can no longer rely on blind voting to hide implementation details; strategic prompt selection could unintentionally reveal a model’s identity.
- Privacy & IP concerns: Companies that license T2I models may need to embed protective transformations to prevent competitors from reverse‑engineering their model signatures.
- Benchmark reproducibility: Researchers should disclose the embedding model and clustering method used for any anonymity claim, enabling reproducible security assessments.
Limitations & Future Work
- Dependence on CLIP embeddings: The study uses a single vision encoder; alternative embeddings (e.g., DINO, ViT‑G) might affect fingerprint strength.
- Prompt pool bias: Although 280 prompts are diverse, they may not cover niche domains where models behave more similarly.
- Defensive strategies not fully evaluated: Proposed anonymization tricks (noise injection, style transfer) are only preliminarily tested; systematic evaluation of their trade‑offs (image quality vs. anonymity) remains open.
- Cross‑modal attacks: The paper focuses on image‑only deanonymization; extending the analysis to video or multimodal outputs could reveal further vulnerabilities.
Bottom line: The work shines a light on an overlooked security dimension of generative AI evaluation. For developers, researchers, and platform operators, it’s a call to embed stronger privacy safeguards into the very pipelines that showcase our most impressive AI creations.
Authors
- Ali Naseh
- Yuefeng Peng
- Anshuman Suri
- Harsh Chaudhari
- Alina Oprea
- Amir Houmansadr
Paper Information
- arXiv ID: 2601.09647v1
- Categories: cs.CV, cs.CR, cs.LG
- Published: January 14, 2026
- PDF: Download PDF