[Paper] Where Does the Answer Come From? Benchmarking View-Level Visual Evidence Identification in Multi-View MLLMs for Autonomous Driving

Published: (June 8, 2026 at 11:39 AM EDT)
2 min read
Source: arXiv

Source: arXiv - 2606.09644v1

Overview

Multimodal large language models (MLLMs) achieve strong results on visual reasoning benchmarks, but answer accuracy alone does not indicate whether a model relied on the correct visual evidence. This gap is particularly important in multi-view driving scenes used for autonomous driving, where a model can produce a plausible answer while grounding it in the wrong camera view. We introduce a multi-view visual question answering benchmark for evaluating evidence-source identification: given six synchronized NuScenes views and a question, the model must identify the supporting camera view and answer the question. The benchmark contains 122 conflict-centric question-answer pairs from 73 scenes, spanning causality, counterfactual reasoning, and intent prediction. View labels are proposed by an automatic conflict-mining pipeline and manually verified by annotators. We evaluate three settings: camera-view selection, oracle QA given the golden view, and joint prediction in which the model selects a view and answers in one pass. Answers are evaluated in both multiple-choice and free-form formats, using exact match for structured predictions and an LLM judge for free-form responses. By explicitly separating visual-source identification from answer correctness, the benchmark exposes grounding failures that answer-only evaluation misses.

Key Contributions

This paper presents research in the following areas:

  • cs.CL
  • cs.CV

Methodology

Please refer to the full paper for detailed methodology.

Practical Implications

This research contributes to the advancement of cs.CL.

Authors

  • Yimu Wang
  • Yee Man Choi
  • Barry Zhang
  • Mozhgan Nasr Azadani
  • Sean Sedwards
  • Krzysztof Czarnecki

Paper Information

  • arXiv ID: 2606.09644v1
  • Categories: cs.CL, cs.CV
  • Published: June 8, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »