[Paper] Learning to Reason in 4D: Dynamic Spatial Understanding for Vision Language Models
Vision-language models (VLM) excel at general understanding yet remain weak at dynamic spatial reasoning (DSR), i.e., reasoning about the evolvement of object g...