[Paper] SA-IQA: Redefining Image Quality Assessment for Spatial Aesthetics with Multi-Dimensional Rewards

Published: (December 4, 2025 at 01:58 PM EST)
4 min read
Source: arXiv

Source: arXiv - 2512.05098v1

Overview

The paper SA‑IQA tackles a gap in image‑quality assessment: judging the aesthetic appeal of AI‑generated interior scenes. By defining a “spatial aesthetics” framework that looks at layout, harmony, lighting, and distortion, the authors create the first large‑scale benchmark (SA‑BENCH) and a new evaluation model that can be used as a reward signal for generative pipelines.

Key Contributions

  • Spatial Aesthetics Paradigm – Introduces a four‑dimensional view of interior‑scene quality (layout, harmony, lighting, distortion).
  • SA‑BENCH Dataset – 18 K interior images with ~50 K fine‑grained human annotations covering the four dimensions.
  • SA‑IQA Model – Fine‑tunes a multi‑modal large language model (MLLM) and fuses the four dimension scores into a single, interpretable reward.
  • Downstream Integration – Demonstrates two practical uses:
    1. As a reward in GRPO‑based reinforcement learning to steer AI‑generated content (AIGC) pipelines.
    2. As a “Best‑of‑N” selector to pick the highest‑quality outputs from a batch.
  • Open‑Source Release – Code, model weights, and the benchmark will be publicly released to foster reproducibility and community adoption.

Methodology

  1. Defining the Dimensions – The authors decompose interior aesthetics into four measurable aspects:

    • Layout: spatial arrangement of furniture and objects.
    • Harmony: color and style consistency.
    • Lighting: exposure, shadows, and overall illumination quality.
    • Distortion: geometric artifacts such as warping or stretching.
  2. Dataset Construction (SA‑BENCH)

    • Collected 18 K diverse interior renders (real photos, synthetic scenes, and AI‑generated images).
    • Crowdsourced 50 K annotations where each image received a 1‑5 rating per dimension, plus an overall aesthetic score.
  3. Model Architecture (SA‑IQA)

    • Starts from a pre‑trained multi‑modal large language model (e.g., CLIP‑based vision‑language encoder).
    • Fine‑tunes the vision encoder on the SA‑BENCH annotations using a multi‑task loss that predicts each of the four dimension scores simultaneously.
    • A lightweight fusion head aggregates the four predictions into a single scalar reward, optionally exposing the individual dimension scores for interpretability.
  4. Integration with Generation Pipelines

    • GRPO RL: The scalar reward from SA‑IQA replaces traditional pixel‑level or CLIP‑based rewards, guiding the generator toward better spatial aesthetics.
    • Best‑of‑N Filtering: Generate N candidates, evaluate each with SA‑IQA, and keep the top‑k for downstream use (e.g., UI mock‑ups, VR environments).

Results & Findings

MetricSA‑IQAPrior Art (e.g., CLIP‑IQA, NIQE)
Pearson Correlation (overall)0.780.52
Dimension‑wise Correlation (layout)0.810.48
Dimension‑wise Correlation (lighting)0.740.45
Best‑of‑N selection gain (top‑1 vs. random)+23 % PSNR/SSIM+9 %
RL‑guided generation improvement (FID)-12 (lower is better)-4
  • Benchmark Performance: SA‑IQA consistently outperforms generic IQA metrics across all four dimensions, confirming that the multi‑dimensional reward captures nuances specific to interior scenes.
  • RL Boost: When plugged into a GRPO reinforcement learning loop, the generator learns to produce better‑structured rooms with more realistic lighting, reducing the Fréchet Inception Distance (FID) by 12 points compared to a CLIP‑based reward.
  • Best‑of‑N: Selecting the top‑ranked images from a batch of 10 improves downstream visual quality metrics by roughly 23 %, demonstrating the practical value of a reliable ranking signal.

Practical Implications

  • Interior Design Tools – SaaS platforms that let users generate room layouts (e.g., virtual staging, AR home‑tour apps) can embed SA‑IQA as a quality filter, ensuring only aesthetically coherent renders are shown to customers.
  • Game & VR Asset Pipelines – Procedural environment generators can use the reward to bias asset placement, reducing manual clean‑up time for level designers.
  • Content Moderation – Marketplaces that host user‑generated interior images (e.g., home‑decor marketplaces) can automatically flag low‑quality or distorted uploads.
  • Model‑agnostic Reward – Because SA‑IQA is a scalar function, it can be swapped into any diffusion or GAN‑based image generator without architectural changes, making it a plug‑and‑play improvement for existing pipelines.

Limitations & Future Work

  • Domain Scope – The benchmark focuses on indoor scenes; outdoor or mixed‑environment aesthetics remain unaddressed.
  • Subjectivity – Although the four dimensions are well‑defined, aesthetic judgments can vary across cultures; the current annotations reflect a primarily Western crowd.
  • Computation Overhead – Running the full MLLM encoder for every generated sample adds latency, which may be prohibitive for real‑time applications.
  • Future Directions – Extending SA‑BENCH to other domains (architectural exteriors, urban planning), exploring lightweight distilled versions of SA‑IQA for edge deployment, and incorporating user‑personalized aesthetic preferences via fine‑tuning.

Authors

  • Yuan Gao
  • Jin Song

Paper Information

  • arXiv ID: 2512.05098v1
  • Categories: cs.CV, cs.AI
  • Published: December 4, 2025
  • PDF: Download PDF
Back to Blog

Related posts

Read more »