[Paper] CASR: A Robust Cyclic Framework for Arbitrary Large-Scale Super-Resolution with Distribution Alignment and Self-Similarity Awareness

Published: (February 25, 2026 at 01:05 PM EST)
4 min read
Source: arXiv

Source: arXiv - 2602.22159v1

Overview

The paper introduces CASR, a cyclic super‑resolution (SR) framework that can upscale images to any factor with a single model. By treating extreme up‑scaling as a series of small, in‑distribution steps, CASR dramatically reduces the noise, blur, and artifacts that normally explode when the target scale falls outside the training range.

Key Contributions

  • Cyclic SR formulation – Re‑expresses arbitrary‑scale up‑sampling as a chain of modest, in‑distribution magnifications, enabling stable inference with one network.
  • Structural Distribution Alignment Module (SDAM) – Uses super‑pixel aggregation to align feature distributions across iterations, preventing drift and error accumulation.
  • Self‑Similarity Aware Restoration Module (SARM) – Enforces autocorrelation constraints and injects low‑resolution (LR) self‑similarity priors to recover high‑frequency textures.
  • Single‑model, multi‑scale solution – No need for separate models or scale‑specific finetuning; the same weights handle 2×, 4×, 16×, or even 64× up‑scaling.
  • State‑of‑the‑art performance on standard benchmarks, especially at extreme magnifications where prior methods collapse.

Methodology

  1. Cyclic Upscaling Loop

    • Instead of a one‑shot jump from LR to the desired HR size, the image is repeatedly passed through the SR network, each time enlarging it by a modest factor (e.g., 1.5×–2×).
    • This keeps every intermediate output within the distribution the model was trained on, avoiding the “out‑of‑distribution” shock that causes artifacts.
  2. Structural Distribution Alignment Module (SDAM)

    • The feature map from the current iteration is segmented into super‑pixels (coherent regions).
    • Statistics (mean, variance) of each super‑pixel are aligned to those of the previous iteration, effectively “re‑centering” the distribution and stopping drift.
  3. Self‑Similarity Aware Restoration Module (SARM)

    • Computes an autocorrelation map of the LR input to capture repeating patterns (textures, edges).
    • During each up‑sampling step, SARM injects these self‑similarity cues back into the feature space, encouraging the network to reproduce realistic high‑frequency details rather than hallucinating noise.
  4. Training

    • The network is trained on a conventional range of scales (e.g., 1×–4×).
    • Losses combine pixel‑wise L1/L2, perceptual (VGG) loss, and a novel distribution‑alignment loss that penalizes divergence between successive super‑pixel statistics.

Results & Findings

Metric (×4 SR)PSNR ↑SSIM ↑
CASR (single model)31.8 dB0.894
Prior art (multi‑model)30.5 dB0.877
Baseline cyclic (no SDAM/SARM)30.9 dB0.882
  • Extreme scales (×16, ×32, ×64): CASR retains visual fidelity, while competing methods exhibit severe blur and ringing.
  • Distribution drift measured by KL‑divergence between successive iterations drops by ~45 % thanks to SDAM.
  • Texture consistency (measured via autocorrelation similarity) improves by ~20 % with SARM, confirming that self‑similarity priors are effectively leveraged.

Qualitative examples show clean edge reconstruction and plausible fine‑grained patterns (e.g., fabric weave, foliage) even at 64× magnification.

Practical Implications

  • Single‑model deployment – Developers can ship one lightweight SR service that handles any client‑requested zoom level, simplifying CI/CD pipelines and reducing memory footprints.
  • Real‑time streaming & VR – The cyclic approach can be throttled adaptively: fewer iterations for low‑latency scenarios, more for high‑quality offline rendering.
  • Legacy image restoration – Archivists can upscale historical photos to very high resolutions without training a bespoke model for each target scale.
  • Edge devices – Because each iteration works on a modest up‑scale factor, the per‑step compute stays bounded, making it feasible to run on mobile GPUs or NPUs with progressive refinement.

Limitations & Future Work

  • Inference latency grows linearly with the number of cycles; extremely high magnifications still require many passes, which may be prohibitive for ultra‑low‑latency use‑cases.
  • The current SDAM relies on super‑pixel segmentation, which adds a preprocessing overhead and may struggle with highly textured or noisy inputs.
  • Authors note that extending the framework to video SR (temporal consistency) and exploring learned adaptive cycle lengths are promising directions for follow‑up research.

Authors

  • Wenhao Guo
  • Zhaoran Zhao
  • Peng Lu
  • Sheng Li
  • Qian Qiao
  • RuiDe Li

Paper Information

  • arXiv ID: 2602.22159v1
  • Categories: cs.CV
  • Published: February 25, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »

[Paper] A Dataset is Worth 1 MB

A dataset server must often distribute the same large payload to many clients, incurring massive communication costs. Since clients frequently operate on divers...