[Paper] Visual Commonsense Driven Knowledge Refinements for Scene Graph Generation
Source: arXiv - 2606.06369v1
Overview
Learning-driven Scene Graph Generation (SGG) models excel on frequent relation types but degrade sharply under annotation sparsity, failing to capture reliable visual commonsense knowledge. We propose a model‑agnostic, semantically‑guided knowledge refinement framework that systematically mines commonsense‑grounded constraints from training data—capturing spatial, functional, and qualitative relational regularities—and uses general declarative commonsense reasoning to correct and refine ranked SGG predictions at inference time. The framework requires no manual rule authoring, no model retraining, and transfers across datasets and architectures. On three standard benchmarks, we obtain consistent improvements over strong baselines, demonstrating that structured visual commonsense reasoning over deep scene semantics is a practical and effective complement to purely learning‑based scene graph generation.
Key Contributions
- Domain: cs.CV
- The paper introduces a semantically‑guided knowledge refinement framework for SGG that:
- Mines commonsense constraints automatically from training data.
- Applies declarative reasoning to refine predictions at inference.
- Works model‑agnostically without retraining.
- Shows consistent gains across multiple benchmarks and architectures.
Methodology
Please refer to the full paper for detailed methodology.
Practical Implications
This research contributes to the advancement of computer vision (cs.CV) by enhancing the robustness of scene graph generation under sparse annotations through commonsense reasoning.
Authors
- Maëlic Neau
- Salim Baloch
- Jakob Suchan
- Zoe Falomir
- Mehul Bhatt
Paper Information
- arXiv ID: 2606.06369v1
- Categories: cs.CV
- Published: June 4, 2026
- PDF: Download PDF