[Paper] OmniAgent: Audio-Guided Active Perception Agent for Omnimodal Audio-Video Understanding
Omnimodal large language models have made significant strides in unifying audio and visual modalities; however, they often lack the fine-grained cross-modal und...