[Paper] CASR: A Robust Cyclic Framework for Arbitrary Large-Scale Super-Resolution with Distribution Alignment and Self-Similarity Awareness
Source: arXiv - 2602.22159v1
Overview
The paper introduces CASR, a cyclic super‑resolution (SR) framework that can upscale images to any factor with a single model. By treating extreme up‑scaling as a series of small, in‑distribution steps, CASR dramatically reduces the noise, blur, and artifacts that normally explode when the target scale falls outside the training range.
Key Contributions
- Cyclic SR formulation – Re‑expresses arbitrary‑scale up‑sampling as a chain of modest, in‑distribution magnifications, enabling stable inference with one network.
- Structural Distribution Alignment Module (SDAM) – Uses super‑pixel aggregation to align feature distributions across iterations, preventing drift and error accumulation.
- Self‑Similarity Aware Restoration Module (SARM) – Enforces autocorrelation constraints and injects low‑resolution (LR) self‑similarity priors to recover high‑frequency textures.
- Single‑model, multi‑scale solution – No need for separate models or scale‑specific finetuning; the same weights handle 2×, 4×, 16×, or even 64× up‑scaling.
- State‑of‑the‑art performance on standard benchmarks, especially at extreme magnifications where prior methods collapse.
Methodology
-
Cyclic Upscaling Loop
- Instead of a one‑shot jump from LR to the desired HR size, the image is repeatedly passed through the SR network, each time enlarging it by a modest factor (e.g., 1.5×–2×).
- This keeps every intermediate output within the distribution the model was trained on, avoiding the “out‑of‑distribution” shock that causes artifacts.
-
Structural Distribution Alignment Module (SDAM)
- The feature map from the current iteration is segmented into super‑pixels (coherent regions).
- Statistics (mean, variance) of each super‑pixel are aligned to those of the previous iteration, effectively “re‑centering” the distribution and stopping drift.
-
Self‑Similarity Aware Restoration Module (SARM)
- Computes an autocorrelation map of the LR input to capture repeating patterns (textures, edges).
- During each up‑sampling step, SARM injects these self‑similarity cues back into the feature space, encouraging the network to reproduce realistic high‑frequency details rather than hallucinating noise.
-
Training
- The network is trained on a conventional range of scales (e.g., 1×–4×).
- Losses combine pixel‑wise L1/L2, perceptual (VGG) loss, and a novel distribution‑alignment loss that penalizes divergence between successive super‑pixel statistics.
Results & Findings
| Metric (×4 SR) | PSNR ↑ | SSIM ↑ |
|---|---|---|
| CASR (single model) | 31.8 dB | 0.894 |
| Prior art (multi‑model) | 30.5 dB | 0.877 |
| Baseline cyclic (no SDAM/SARM) | 30.9 dB | 0.882 |
- Extreme scales (×16, ×32, ×64): CASR retains visual fidelity, while competing methods exhibit severe blur and ringing.
- Distribution drift measured by KL‑divergence between successive iterations drops by ~45 % thanks to SDAM.
- Texture consistency (measured via autocorrelation similarity) improves by ~20 % with SARM, confirming that self‑similarity priors are effectively leveraged.
Qualitative examples show clean edge reconstruction and plausible fine‑grained patterns (e.g., fabric weave, foliage) even at 64× magnification.
Practical Implications
- Single‑model deployment – Developers can ship one lightweight SR service that handles any client‑requested zoom level, simplifying CI/CD pipelines and reducing memory footprints.
- Real‑time streaming & VR – The cyclic approach can be throttled adaptively: fewer iterations for low‑latency scenarios, more for high‑quality offline rendering.
- Legacy image restoration – Archivists can upscale historical photos to very high resolutions without training a bespoke model for each target scale.
- Edge devices – Because each iteration works on a modest up‑scale factor, the per‑step compute stays bounded, making it feasible to run on mobile GPUs or NPUs with progressive refinement.
Limitations & Future Work
- Inference latency grows linearly with the number of cycles; extremely high magnifications still require many passes, which may be prohibitive for ultra‑low‑latency use‑cases.
- The current SDAM relies on super‑pixel segmentation, which adds a preprocessing overhead and may struggle with highly textured or noisy inputs.
- Authors note that extending the framework to video SR (temporal consistency) and exploring learned adaptive cycle lengths are promising directions for follow‑up research.
Authors
- Wenhao Guo
- Zhaoran Zhao
- Peng Lu
- Sheng Li
- Qian Qiao
- RuiDe Li
Paper Information
- arXiv ID: 2602.22159v1
- Categories: cs.CV
- Published: February 25, 2026
- PDF: Download PDF