[Paper] Visual Commonsense Driven Knowledge Refinements for Scene Graph Generation

Published: (June 4, 2026 at 12:36 PM EDT)
2 min read
Source: arXiv

Source: arXiv - 2606.06369v1

Overview

Learning-driven Scene Graph Generation (SGG) models excel on frequent relation types but degrade sharply under annotation sparsity, failing to capture reliable visual commonsense knowledge. We propose a model‑agnostic, semantically‑guided knowledge refinement framework that systematically mines commonsense‑grounded constraints from training data—capturing spatial, functional, and qualitative relational regularities—and uses general declarative commonsense reasoning to correct and refine ranked SGG predictions at inference time. The framework requires no manual rule authoring, no model retraining, and transfers across datasets and architectures. On three standard benchmarks, we obtain consistent improvements over strong baselines, demonstrating that structured visual commonsense reasoning over deep scene semantics is a practical and effective complement to purely learning‑based scene graph generation.

Key Contributions

  • Domain: cs.CV
  • The paper introduces a semantically‑guided knowledge refinement framework for SGG that:
    • Mines commonsense constraints automatically from training data.
    • Applies declarative reasoning to refine predictions at inference.
    • Works model‑agnostically without retraining.
    • Shows consistent gains across multiple benchmarks and architectures.

Methodology

Please refer to the full paper for detailed methodology.

Practical Implications

This research contributes to the advancement of computer vision (cs.CV) by enhancing the robustness of scene graph generation under sparse annotations through commonsense reasoning.

Authors

  • Maëlic Neau
  • Salim Baloch
  • Jakob Suchan
  • Zoe Falomir
  • Mehul Bhatt

Paper Information

  • arXiv ID: 2606.06369v1
  • Categories: cs.CV
  • Published: June 4, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »