Temporal Preference Optimization for Long-Form Video Understanding Paper • 2501.13919 • Published 7 days ago • 21
BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature Paper • 2501.07171 • Published 17 days ago • 49
Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation Paper • 2501.03225 • Published 24 days ago • 7
BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature Paper • 2501.07171 • Published 17 days ago • 49
Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper • 2412.10360 • Published Dec 13, 2024 • 139
Revisiting Active Learning in the Era of Vision Foundation Models Paper • 2401.14555 • Published Jan 25, 2024