[Paper] A Multi-Domain Benchmark for Detecting AI-Generated Text-Rich Images from GPT-Image-2

Published: 1 day ago (June 17, 2026 at 12:37 PM EDT)

2 min read

Source: arXiv

Source: arXiv - 2606.19259v1

Overview

Text-rich images often contain privacy-sensitive, transactional, or decision-relevant information. As recent multimodal image generation models become increasingly capable of synthesizing realistic textual content and structured visual designs, detecting AI-generated text-rich images has become an important challenge for digital trust and content authenticity. Existing benchmarks, however, largely focus on object-centric images and provide limited coverage of scenarios where textual semantics and layout organization are central. In this paper, we introduce a multi-domain benchmark for detecting text-rich images generated by OpenAI’s GPT Image 2. The benchmark contains 8,602 images across six representative categories: commercial posters, infographics, academic posters, receipts, tables, and UI screenshots. Using this benchmark, we evaluate five representative AI-generated image detectors in a zero-shot setting and analyze their overall, category-wise, and post-processing robustness. Our results show that detector performance is highly domain-dependent: methods that perform well in some categories often fail on others, and even the strongest conventional detector exhibits severe sensitivity to JPEG compression. We further conduct an exploratory evaluation with a multimodal vision-language model, revealing both its promise and its limitations on structured formats. These findings highlight the need for text- and layout-aware detection methods for modern AI-generated images. Our dataset is released at XXX.

Key Contributions

This paper presents research in the following areas:

cs.CV
cs.AI

Methodology

Please refer to the full paper for detailed methodology.

Practical Implications

This research contributes to the advancement of cs.CV.

Authors

Yijin Wang
Shuyi Wang
Wenhan Zhang
Yuqi Ouyang

Paper Information

arXiv ID: 2606.19259v1
Categories: cs.CV, cs.AI
Published: June 17, 2026
PDF: Download PDF

[Paper] A Multi-Domain Benchmark for Detecting AI-Generated Text-Rich Images from GPT-Image-2

Overview

Key Contributions

Methodology

Practical Implications

Authors

Paper Information

Related posts

[Paper] Reference-Driven Multi-Speaker Audio Scene Generation from In-the-Wild Priors

[Paper] Confidence is Not Reliability: Rethinking MC Dropout in Brain Tumour Segmentation

[Paper] OneCanvas: 3D Scene Understanding via Panoramic Reprojection

[Paper] Transformer Geometry Observatory TGO-I: Spectral Geometry Observatory