[Paper] GateFusion: Hierarchical Gated Cross-Modal Fusion for Active Speaker Detection
Active Speaker Detection (ASD) aims to identify who is currently speaking in each frame of a video. Most state-of-the-art approaches rely on late fusion to comb...