[Paper] Visual Prompting Meets Feature Reconstruction-Based Anomaly Detection with Dual-Teacher Supervision

Published: 3 days ago (June 8, 2026 at 11:52 AM EDT)

2 min read

Source: arXiv

Source: arXiv - 2606.09670v1

Overview

Recent Anomaly Detection methods achieve perfect detection and segmentation scores on well-established datasets, such as MVTec. However, many of these methods face challenges when foundational assumptions - such as consistent object scale, viewpoint, background, illumination, and centered placement - are violated. Those variations that occur render anomaly detection methods unusable in many real-world scenarios. To address these limitations, we introduce three key contributions: (1) a visual prompting pipeline that isolates objects using foreground-background masking; (2) a mechanism for unfreezing the teacher in student-teacher models to improve domain adaptability; and (3) a data augmentation strategy leveraging diffusion-generated synthetic images to enhance anomaly detection performance. We achieve a 3.5 percentage point improvement over the previous state-of-the-art on the challenging AeBAD dataset by using the Masked Multiscale Reconstruction (MMR) model as our backbone.

Key Contributions

This paper presents research in the following areas:

cs.CV
cs.AI

Methodology

Please refer to the full paper for detailed methodology.

Practical Implications

This research contributes to the advancement of cs.CV.

Authors

Mateo Diaz-Bone
Daniel Caraballo
Florian Scheidegger
Thomas Frick
Mattia Rigotti
Andrea Bartezzaghi
Roy Assaf
Niccolo Avogaro
Yagmur G. Cinar
Brown Ebouky
Filip M. Janicki
Piotr S. Kluska
Cezary Skura
Cristiano Malossi

Paper Information

arXiv ID: 2606.09670v1
Categories: cs.CV, cs.AI
Published: June 8, 2026
PDF: Download PDF

[Paper] Visual Prompting Meets Feature Reconstruction-Based Anomaly Detection with Dual-Teacher Supervision

Overview

Key Contributions

Methodology

Practical Implications

Authors

Paper Information

Related posts

[Paper] Reroute, Don't Remove: Recoverable Visual Token Routing for Vision-Language Models

[Paper] DIRECT: When and Where Should You Allocate Test-Time Compute in Embodied Planners?

[Paper] Illumination-Robust Camera-Based Heart-Rate Estimation for Physiological Sensing in Robots

[Paper] Atlas H&E-TME: Scalable AI-Based Tissue Profiling at Expert Pathologist-Level Accuracy

Overview

Key Contributions

Methodology

Practical Implications

Authors

Paper Information

Related posts

[Paper] Reroute, Don't Remove: Recoverable Visual Token Routing for Vision-Language Models

[Paper] DIRECT: When and Where Should You Allocate Test-Time Compute in Embodied Planners?

[Paper] Illumination-Robust Camera-Based Heart-Rate Estimation for Physiological Sensing in Robots

[Paper] Atlas H&amp;E-TME: Scalable AI-Based Tissue Profiling at Expert Pathologist-Level Accuracy

[Paper] Atlas H&E-TME: Scalable AI-Based Tissue Profiling at Expert Pathologist-Level Accuracy