[Paper] Evaluating the encoding competence of visual language models using uncommon actions
We propose UAIT (Uncommon-sense Action Image-Text) dataset, a new evaluation benchmark designed to test the semantic understanding ability of visual language mo...