Bridging the Visual Gap: Fine-Tuning Multimodal Models with Knowledge-Adapted Captions Paper • 2411.09018 • Published 11 days ago
LiveXiv -- A Multi-Modal Live Benchmark Based on Arxiv Papers Content Paper • 2410.10783 • Published Oct 14 • 25
ICC: Quantifying Image Caption Concreteness for Multimodal Dataset Curation Paper • 2403.01306 • Published Mar 2
MOCHa: Multi-Objective Reinforcement Mitigating Caption Hallucinations Paper • 2312.03631 • Published Dec 6, 2023