[Paper] ClusIR: Towards Cluster-Guided All-in-One Image Restoration

Published: (December 11, 2025 at 01:59 PM EST)
4 min read
Source: arXiv

Source: arXiv - 2512.10948v1

Overview

The paper introduces ClusIR, a new “all‑in‑one” image‑restoration framework that can handle many different kinds of image degradations—blur, noise, compression artifacts, and even mixtures of them—within a single model. By explicitly clustering degradation types and using those clusters to guide both spatial and frequency‑domain processing, ClusIR achieves higher visual fidelity than prior universal restorers while keeping the system manageable for real‑world deployment.

Key Contributions

  • Cluster‑guided degradation semantics: Learns a probabilistic clustering of degradation types, turning vague “unknown degradation” into explicit, interpretable clusters.
  • Probabilistic Cluster‑Guided Routing Mechanism (PCGRM): Decouples degradation recognition from expert activation, allowing the model to route image patches to the most suitable restoration experts in a stable, differentiable way.
  • Degradation‑Aware Frequency Modulation Module (DAFMM): Uses the cluster cues to adaptively decompose and modulate frequency components, boosting both structural (low‑frequency) and textural (high‑frequency) recovery.
  • Unified spatial‑frequency synergy: The two modules work together, letting semantic degradation cues directly influence frequency‑domain adjustments—something most prior AiOIR methods ignore.
  • Extensive benchmark validation: Demonstrates competitive or state‑of‑the‑art results on multiple standard restoration datasets, including mixed‑degradation scenarios that are notoriously hard for single‑task models.

Methodology

  1. Learnable Degradation Clustering

    • The network first extracts a compact feature vector for each input image (or patch).
    • A lightweight clustering head predicts a probability distribution over K degradation clusters (e.g., “Gaussian noise”, “motion blur”, “JPEG compression”).
    • These probabilities are treated as soft labels, so the model can express uncertainty when degradations are mixed.
  2. Probabilistic Cluster‑Guided Routing (PCGRM)

    • Each cluster is associated with a small “expert” sub‑network specialized for that degradation family.
    • The soft cluster probabilities weight the outputs of all experts, effectively routing the image through a blend of experts rather than a hard switch.
    • This design keeps gradients stable during training and avoids the “expert collapse” problem seen in hard‑routing mixtures of experts.
  3. Degradation‑Aware Frequency Modulation (DAFMM)

    • The routed feature map is passed through a frequency‑decomposition block (e.g., a learnable wavelet or Fourier split).
    • Cluster probabilities modulate the gain applied to each frequency band, allowing the model to amplify or suppress details according to the identified degradation (e.g., boost high‑freq for denoising, preserve low‑freq for deblurring).
    • The modulated bands are recombined, yielding a restored image that respects both structural integrity and fine texture.
  4. Training Objective

    • A combination of reconstruction loss (L1/L2), perceptual loss (VGG‑based), and a clustering regularizer that encourages distinct cluster embeddings.
    • End‑to‑end training lets the clustering, routing, and frequency modules co‑adapt.

Results & Findings

  • Quantitative gains: Across five benchmark suites (e.g., DIV2K‑Denoise, GoPro‑Deblur, JPEG‑Artifacts), ClusIR improves PSNR/SSIM by 0.3–0.9 dB over the strongest baselines, with the biggest jumps on mixed‑degradation test sets.
  • Visual quality: Side‑by‑side comparisons show sharper edges, fewer ringing artifacts, and more natural textures, especially when an image suffers from simultaneous blur and compression.
  • Ablation studies: Removing PCGRM drops performance by ~0.5 dB, while disabling DAFMM leads to noticeable texture loss, confirming that both spatial routing and frequency modulation are essential.
  • Efficiency: Despite having multiple experts, the soft routing allows parallel execution; the overall FLOPs are comparable to a single‑task restoration network, making it feasible for real‑time inference on modern GPUs.

Practical Implications

  • Unified restoration service: Developers can expose a single API endpoint for image cleanup (e.g., user‑uploaded photos, surveillance footage) without needing to pre‑detect the degradation type.
  • Edge‑device friendliness: The soft routing and shared backbone keep memory footprints low, enabling deployment on smartphones or embedded vision modules where multiple specialized models would be impractical.
  • Content‑aware pipelines: Media platforms can automatically improve user‑generated content (social media, e‑commerce listings) even when the upload pipeline mixes compression, low‑light noise, and motion blur.
  • Improved data augmentation: Training pipelines that synthesize diverse degradations can now be validated against a single, robust model, simplifying quality‑control loops.
  • Foundation for downstream tasks: Cleaner images boost the performance of downstream CV tasks (object detection, OCR, face recognition), so integrating ClusIR as a pre‑processor can raise overall system accuracy.

Limitations & Future Work

  • Cluster granularity: The current approach fixes the number of degradation clusters K a priori; choosing K too low may under‑represent rare degradations, while too high can dilute expert specialization. Adaptive or hierarchical clustering could be explored.
  • Extreme degradations: Very severe or out‑of‑distribution corruptions (e.g., heavy rain streaks, sensor saturation) still challenge the model, suggesting a need for broader training data or additional expert modules.
  • Interpretability: While cluster probabilities are available, mapping them to human‑readable degradation names requires post‑hoc labeling; tighter integration with explicit degradation descriptors could improve transparency.
  • Real‑time constraints on low‑power hardware: Although FLOPs are comparable to single‑task models, the memory bandwidth for parallel expert execution may still be a bottleneck on ultra‑low‑power devices; model pruning or knowledge distillation of the expert ensemble is a promising direction.

ClusIR demonstrates that a well‑designed synergy between semantic clustering and frequency‑domain modulation can finally deliver a truly “all‑in‑one” image restoration system that is both practical and high‑performing for today’s diverse visual data pipelines.

Authors

  • Shengkai Hu
  • Jiaqi Ma
  • Jun Wan
  • Wenwen Min
  • Yongcheng Jing
  • Lefei Zhang
  • Dacheng Tao

Paper Information

  • arXiv ID: 2512.10948v1
  • Categories: cs.CV
  • Published: December 11, 2025
  • PDF: Download PDF
Back to Blog

Related posts

Read more »