[Paper] Video Evidence to Reasoning Efficient Video Understanding via Explicit Evidence Grounding
Large Vision-Language Models (LVLMs) face a fundamental dilemma in video reasoning: they are caught between the prohibitive computational costs of verbose reaso...