Learning GUI Grounding with Spatial Reasoning from Visual Feedback Paper • 2509.21552 • Published 25 days ago • 11
BrowseComp-Plus: A More Fair and Transparent Evaluation Benchmark of Deep-Research Agent Paper • 2508.06600 • Published Aug 8 • 39
Benchmarking Multimodal Mathematical Reasoning with Explicit Visual Dependency Paper • 2504.18589 • Published Apr 24 • 13 • 3
What Is That Talk About? A Video-to-Text Summarization Dataset for Scientific Presentations Paper • 2502.08279 • Published Feb 12 • 1
MMLongBench: Benchmarking Long-Context Vision-Language Models Effectively and Thoroughly Paper • 2505.10610 • Published May 15 • 53
MMLongBench: Benchmarking Long-Context Vision-Language Models Effectively and Thoroughly Paper • 2505.10610 • Published May 15 • 53 • 3
MMLongBench: Benchmarking Long-Context Vision-Language Models Effectively and Thoroughly Paper • 2505.10610 • Published May 15 • 53
PosterSum: A Multimodal Benchmark for Scientific Poster Summarization Paper • 2502.17540 • Published Feb 24 • 3
Lost in Time: Clock and Calendar Understanding Challenges in Multimodal LLMs Paper • 2502.05092 • Published Feb 7 • 8
PosterSum: A Multimodal Benchmark for Scientific Poster Summarization Paper • 2502.17540 • Published Feb 24 • 3
PosterSum: A Multimodal Benchmark for Scientific Poster Summarization Paper • 2502.17540 • Published Feb 24 • 3 • 2