[Paper] VTCBench: Can Vision-Language Models Understand Long Context with Vision-Text Compression?
The computational and memory overheads associated with expanding the context window of LLMs severely limit their scalability. A noteworthy solution is vision-te...