[Paper] SceneCritic: A Symbolic Evaluator for 3D Indoor Scene Synthesis

Published: (April 14, 2026 at 01:59 PM EDT)
5 min read
Source: arXiv

Source: arXiv - 2604.13035v1

Overview

The paper introduces SceneCritic, a symbolic, rule‑based evaluator that checks the plausibility of 3‑D indoor scene layouts at the floor‑plan level. By grounding its constraints in a newly built spatial ontology (SceneOnto), SceneCritic can automatically flag semantic, orientation, and geometric errors—something that current LLM/VLM judges struggle with because they depend on rendered images and are highly sensitive to viewpoint and prompt wording.

Key Contributions

  • SceneOnto – a unified spatial ontology compiled from 3D‑FRONT, ScanNet, and Visual Genome that encodes common indoor object relationships, orientations, and size constraints.
  • SceneCritic – a symbolic evaluator that traverses SceneOnto to verify layout coherence, delivering fine‑grained, object‑level diagnostics instead of a single scalar score.
  • Critic Modalities Benchmark – an experimental test‑bed that compares three feedback loops for iterative scene synthesis:
    1. rule‑based collision constraints,
    2. text‑only LLM critic,
    3. image‑based VLM critic.
  • Human‑Alignment Study – empirical evidence that SceneCritic’s scores correlate far more closely with human judgments than existing VLM‑based evaluators.
  • Insightful Findings – text‑only LLMs surprisingly outperform VLMs on pure semantic layout quality, while VLM‑driven refinement excels at fixing orientation and spatial alignment issues.

Methodology

1. Data Fusion & Ontology Construction

  • Extracted object co‑occurrence, typical orientations (e.g., “sofa faces TV”), and size statistics from three large datasets.
  • Normalized and merged these priors into a graph‑structured ontology (SceneOnto) where nodes are object categories and edges encode relational constraints (e.g., “must be adjacent”, “cannot overlap”).

2. Symbolic Evaluation Engine (SceneCritic)

  • Input: a floor‑plan layout expressed as a list of objects with class, position, and orientation.
  • Checks three families of constraints:
    • Semantic – is the object plausible in the given room context?
    • Orientation – are directional relationships satisfied (e.g., “bed head against wall”)?
    • Geometric – are there collisions or impossible size ratios?
  • Output: a structured report containing per‑object pass/fail flags and the specific violated rule.

3. Iterative Refinement Test‑bed

  • Rule‑based critic – feeds back collision violations as hard constraints.
  • LLM critic – serializes the layout into natural‑language statements; an LLM suggests edits.
  • VLM critic – renders the layout from multiple viewpoints, feeds images to a vision‑language model, and receives corrective suggestions.

4. Evaluation

  • Collected human ratings on a subset of generated scenes.
  • Measured correlation (Spearman’s ρ) between each evaluator’s scores and human judgments.
  • Compared final layout quality after a fixed number of refinement iterations per critic modality.

Results & Findings

EvaluatorCorrelation with Human ScoresSemantic Quality ↑Orientation / Geometry ↑
SceneCritic (symbolic)0.780.810.74
VLM‑based evaluator0.450.480.42
LLM‑only (text)0.620.850.55
VLM‑driven refinement (final layout)0.780.81
  • Alignment: SceneCritic’s scores align substantially better with human perception than any VLM‑only metric.
  • Semantic Edge: Pure text LLMs (e.g., GPT‑4) capture object‑type plausibility without visual input, outperforming VLMs on that dimension.
  • Orientation Fixes: When the critic operates on rendered images, the model learns to correct facing directions and collision issues more effectively than rule‑only feedback.
  • Iterative Gains: After three refinement cycles, VLM‑driven feedback yields the highest combined semantic‑orientation score, while rule‑based feedback quickly eliminates gross collisions but plateaus on higher‑level semantics.

Practical Implications

  • Robust Automated QA for Asset Pipelines – game studios and AR/VR developers can plug SceneCritic into procedural generation pipelines to catch impossible object placements before costly rendering or physics simulation.
  • Debug‑Friendly Feedback – because SceneCritic returns explicit rule violations, developers receive actionable diagnostics (“sofa overlaps wall”, “lamp not facing desk”) instead of opaque confidence scores.
  • Hybrid Generation Strategies – a two‑stage approach is suggested: use an LLM to draft a semantically sound layout, then hand it to a VLM‑based refinement loop for fine‑grained orientation and collision fixes.
  • Dataset‑Driven Ontology Updates – the ontology can be refreshed with new domain‑specific priors (e.g., office vs. residential) to tailor the evaluator for specialized interior‑design tools.
  • Benchmark Standardization – SceneCritic offers a reproducible, viewpoint‑independent metric that could become a community benchmark for 3‑D scene synthesis research, reducing reliance on noisy human‑in‑the‑loop evaluations.

Limitations & Future Work

  • Ontology Coverage – SceneOnto is limited to the object categories present in the three source datasets; exotic or custom assets may lack appropriate constraints.
  • Floor‑Plan Focus – the evaluator operates at the 2‑D layout level and does not directly assess 3‑D details such as mesh quality, material realism, or lighting.
  • Scalability of Textual Conversion – translating large, complex scenes into natural‑language prompts for LLM critics can become verbose and may lose nuance.
  • Future Directions – extending the ontology to incorporate functional affordances (e.g., “chair must be reachable from desk”), integrating multi‑modal feedback loops (simultaneous LLM + VLM), and exploring learned symbolic constraints that adapt from user‑generated correction data.

Authors

  • Kathakoli Sengupta
  • Kai Ao
  • Paola Cascante‑Bonilla

Paper Information

  • arXiv ID: 2604.13035v1
  • Categories: cs.CV, cs.CL
  • Published: April 14, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »