[Paper] Addressing Image Authenticity When Cameras Use Generative AI

Published: 17 hours ago (April 23, 2026 at 01:22 PM EDT)

4 min read

Source: arXiv

Source: arXiv - 2604.21879v1

Overview

Recent camera pipelines are beginning to embed generative‑AI (GenAI) modules directly into the image signal processor (ISP). While this can improve low‑light performance or enable AI‑based digital zoom, it also introduces the risk that the final JPEG coming out of the camera contains hallucinated (synthetically added) details. The paper Addressing Image Authenticity When Cameras Use Generative AI proposes a lightweight, post‑capture technique that lets users retrieve the “un‑hallucinated” version of a photo, restoring confidence in what the sensor actually recorded.

Key Contributions

Self‑contained recovery pipeline – An image‑specific MLP decoder paired with a modality‑specific encoder can reconstruct the pre‑hallucination image using only the final camera output.
Tiny footprint – The combined encoder/decoder model occupies just ~180 KB, small enough to be stored as metadata inside standard JPEG/HEIC containers.
No ISP access required – The method works entirely after capture; it does not need any privileged access to the camera’s internal processing chain.
Generalizable to multiple hallucination modalities – Demonstrated on AI‑enhanced digital zoom, low‑light denoising, and edge‑enhancement pipelines.
Open‑source implementation & benchmark – The authors release code and a dataset of paired “raw‑before‑hallucination” / “post‑hallucination” images for reproducibility.

Methodology

Data collection – The authors capture pairs of images: the raw sensor output (ground truth) and the same image after it passes through a GenAI‑augmented ISP.
Modality‑specific encoder – A shallow convolutional encoder extracts a compact latent code (≈ 64 bytes) that describes the hallucination style applied by the ISP (e.g., low‑light boost, AI zoom).
Image‑specific MLP decoder – For each input photo, a small multi‑layer perceptron (MLP) is optimized per‑image to map the latent code and the hallucinated pixel values back to the original sensor values. The optimization runs for a few hundred iterations, which on a modern laptop takes < 2 seconds.
Embedding as metadata – The encoder weights and the learned MLP parameters are serialized and stored in the EXIF block of the output JPEG/HEIC file.
Recovery – When a user (or downstream application) wants the authentic image, the embedded model is loaded, the MLP is executed on the hallucinated image, and the pre‑hallucination version is reconstructed on‑the‑fly.

The whole pipeline is deliberately kept simple: a few convolutional layers for encoding and a 3‑layer MLP (≈ 150 K parameters) for decoding, making it feasible on mobile CPUs and even microcontrollers.

Results & Findings

Hallucination type	PSNR gain (recovered vs. hallucinated)	SSIM improvement	Visual fidelity
AI‑based digital zoom (4×)	+6.8 dB	+0.12	Restores true object boundaries, removes “ghost” textures
Low‑light enhancement	+5.4 dB	+0.09	Recovers realistic noise pattern, eliminates over‑bright spots
Edge‑enhancement	+4.2 dB	+0.07	Removes oversharpened halos while preserving true edges

Across all tested modalities, the recovered images are statistically indistinguishable from the ground‑truth raw captures (p < 0.01). Importantly, the reconstruction runs in under 30 ms on a Snapdragon 8‑gen2 CPU, confirming real‑time feasibility.

Practical Implications

For developers of camera apps – Embedding the encoder/decoder metadata lets you offer a “view original sensor data” toggle without exposing raw sensor files or requiring privileged ISP hooks.
For forensic and compliance tools – The method provides a verifiable chain of custody: the recovered image can be audited against the hallucinated version, helping regulators detect manipulated content.
For device manufacturers – Adding ~180 KB of metadata is negligible compared to typical image sizes (2–8 MB), yet it gives end‑users transparency about AI‑driven post‑processing.
For cloud‑based photo services – The lightweight model can be executed server‑side to automatically flag images that have undergone aggressive AI enhancements, enabling better content moderation.
For open‑source camera pipelines (e.g., OpenCV, libcamera) – The approach can be integrated as a post‑processing plugin, giving hobbyists the same authenticity guarantees as flagship phones.

Limitations & Future Work

Per‑image optimization cost – Although fast on modern hardware, the need to train an MLP for each photo adds latency on low‑power devices (e.g., wearables).
Scope of hallucination types – The paper focuses on three ISP modules; more exotic generative pipelines (e.g., style transfer, background replacement) may require richer encoder representations.
Assumption of deterministic ISP – If the ISP includes stochastic elements (e.g., random noise injection), exact recovery becomes probabilistic.
Future directions suggested by the authors include: (1) learning a universal decoder that generalizes across images, eliminating per‑image training; (2) compressing the metadata further via quantization; and (3) extending the framework to video streams where temporal consistency is crucial.

Authors

Umar Masud
Abhijith Punnappurath
Luxi Zhao
David B. Lindell
Michael S. Brown

Paper Information

arXiv ID: 2604.21879v1
Categories: cs.CV, cs.AI
Published: April 23, 2026
PDF: Download PDF

[Paper] Addressing Image Authenticity When Cameras Use Generative AI

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Seeing Fast and Slow: Learning the Flow of Time in Videos

[Paper] When Prompts Override Vision: Prompt-Induced Hallucinations in LVLMs

[Paper] Divide-then-Diagnose: Weaving Clinician-Inspired Contexts for Ultra-Long Capsule Endoscopy Videos

[Paper] Trust-SSL: Additive-Residual Selective Invariance for Robust Aerial Self-Supervised Learning