[Paper] MaCo-GAN: Manifold-Contrastive Adversarial Learning for Single Image Super-Resolution

Published: 1 week ago (June 3, 2026 at 12:29 PM EDT)

5 min read

Source: arXiv

Source: arXiv - 2606.05068v1

Overview

The paper introduces MaCo‑GAN, a new way to train single‑image super‑resolution (SISR) models that reduces the “hallucinated” artifacts often seen in GAN‑based upscalers. By swapping the classic adversarial loss for a supervised contrastive objective, the authors create a more disciplined game between generator and discriminator, leading to sharper, more realistic high‑resolution outputs without sacrificing fidelity to the low‑resolution input.

Key Contributions

Manifold‑Contrastive GAN (MaCo‑GAN): Replaces the conventional adversarial loss with a contrastive loss that explicitly separates “on‑manifold” (plausible) and “off‑manifold” (implausible) fake samples.
Dynamic Fake Sample Synthesizer: Generates a continuum of fake HR images from the ground‑truth by applying controlled degradations, guaranteeing that every fake still corresponds to the same LR input.
Contrastive Minimax Game: Formulates generator and discriminator objectives as a push‑pull contrastive problem—generators are encouraged to move toward on‑manifold fakes and away from off‑manifold ones, while discriminators do the opposite.
Drop‑in Replacement: The new loss can be swapped into existing SR pipelines (e.g., ESRGAN, RCAN) without architectural changes, delivering consistent perception‑distortion improvements across multiple datasets.
Extensive Ablations & Analysis: Provides thorough experiments that dissect the impact of each component (fake synthesis strength, contrastive temperature, batch composition) and visualizes the evolving feature space during training.

Methodology

Fake Sample Synthesis
- Starting from a high‑resolution ground‑truth (GT) image, the authors apply a set of stochastic degradations (blur, noise, compression) to create a spectrum of fake HR images.
- All fakes share the same low‑resolution (LR) counterpart, ensuring the conditional relationship (LR → HR) stays intact.
Contrastive Objective
- For each training step, a batch contains: the real GT, several on‑manifold fakes (low distortion), and several off‑manifold fakes (high distortion).
- The discriminator is trained with a supervised contrastive loss that pulls together embeddings of on‑manifold samples and pushes apart embeddings of off‑manifold samples.
- The generator receives the opposite signal: it tries to make its output’s embedding close to the on‑manifold cluster and far from the off‑manifold cluster.
Training Loop
- The generator produces an SR image from the LR input.
- The discriminator processes the SR image together with the synthesized fakes and the GT, computing the contrastive loss.
- Gradients are back‑propagated to both networks, forming a contrastive minimax game that replaces the usual GAN binary cross‑entropy loss.
Integration
- The authors plug the MaCo‑GAN loss into several state‑of‑the‑art SR backbones, keeping all other hyper‑parameters unchanged, demonstrating the method’s plug‑and‑play nature.

Results & Findings

Model (baseline)	PSNR ↑ / SSIM ↑	LPIPS ↓ (perceptual)	MOS (Mean Opinion Score)
ESRGAN	27.8 / 0.81	0.12	3.4
ESRGAN + MaCo‑GAN	27.5 / 0.80	0.09	3.9
RCAN	28.3 / 0.84	0.11	3.2
RCAN + MaCo‑GAN	28.0 / 0.83	0.08	3.8

Perception‑Distortion Trade‑off: MaCo‑GAN consistently pushes the LPIPS curve down (better perceptual quality) while only modestly affecting PSNR/SSIM, indicating that realism improves without a large fidelity penalty.
Ablation Insights:
- Removing the dynamic fake synthesizer (using only a single fake) degrades performance, confirming the need for a diverse fake manifold.
- Varying the contrastive temperature shows a sweet spot (τ≈0.07) where the discriminator’s feature space is neither too tight nor too loose.
Feature Space Evolution: t‑SNE visualizations reveal that on‑manifold samples form a tight cluster that the generator gradually learns to inhabit, while off‑manifold samples remain well separated throughout training.

Practical Implications

Cleaner Upscaling in Production: Developers integrating SR into video streaming, gaming, or medical imaging pipelines can adopt MaCo‑GAN to reduce the “ghosting” and texture artifacts that often plague GAN‑based upscalers.
Plug‑and‑Play Upgrade: Since the method only swaps the loss function, existing SR models can be upgraded without retraining the entire architecture, saving engineering effort.
Better User Experience: For UI/UX teams, higher perceptual quality translates to sharper thumbnails, more realistic texture rendering in AR/VR, and improved visual fidelity in low‑bandwidth scenarios.
Potential for Edge Deployment: The contrastive loss is computationally comparable to a standard GAN loss, meaning the training overhead is modest and inference remains unchanged—critical for on‑device SR on smartphones or embedded systems.

Limitations & Future Work

Synthetic Fake Diversity: The quality of the contrastive game hinges on the fake synthesizer’s ability to span the true distribution of plausible HR images. Hand‑crafted degradations may miss more complex real‑world artifacts (e.g., sensor noise patterns).
Scalability to Ultra‑High Res: Experiments stop at 4× upscaling (e.g., 720p → 1080p). It remains unclear how the method scales to 8× or 16× super‑resolution where hallucination risk is higher.
Training Stability: While contrastive losses are less prone to mode collapse than binary GAN losses, the authors note occasional oscillations when the fake‑to‑real ratio is extreme, suggesting a need for adaptive sampling strategies.
Future Directions: The authors propose extending the fake synthesizer with learned degradation models, integrating perceptual metrics directly into the contrastive objective, and exploring multi‑modal contrastive setups for video SR where temporal consistency matters.

Authors

Daeyoung Han
Seongmin Hwang
Moongu Jeon

Paper Information

arXiv ID: 2606.05068v1
Categories: cs.CV
Published: June 3, 2026
PDF: Download PDF

[Paper] MaCo-GAN: Manifold-Contrastive Adversarial Learning for Single Image Super-Resolution

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] UniSHARP: Universal Sharp Monocular View Synthesis

[Paper] MemDreamer: Decoupling Perception and Reasoning for Long Video Understanding via Hierarchical Graph Memory and Agentic Retrieval Mechanism

[Paper] Streaming Video Generation with Streaming Force Control

[Paper] Differences in Detection: Explainability Where it Matters