Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2412.08687

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6 • 25
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6 • 12
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7 • 39
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7 • 20

Rethinking Data Selection at Scale: Random Selection is Almost All You Need

Paper • 2410.09335 • Published Oct 12 • 16
From Generalist to Specialist: Adapting Vision Language Models via Task-Specific Visual Instruction Tuning

Paper • 2410.06456 • Published Oct 9 • 35
Emergent properties with repeated examples

Paper • 2410.07041 • Published Oct 9 • 8
Personalized Visual Instruction Tuning

Paper • 2410.07113 • Published Oct 9 • 69

MIT-10M: A Large Scale Parallel Corpus of Multilingual Image Translation

Paper • 2412.07147 • Published 8 days ago • 5
Grounding Descriptions in Images informs Zero-Shot Visual Recognition

Paper • 2412.04429 • Published 13 days ago
Exploring Multi-Grained Concept Annotations for Multimodal Large Language Models

Paper • 2412.05939 • Published 10 days ago • 12
Euclid: Supercharging Multimodal LLMs with Synthetic High-Fidelity Visual Descriptions

Paper • 2412.08737 • Published 7 days ago • 48

GATE OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation

Paper • 2411.18499 • Published 21 days ago • 18
VLSBench: Unveiling Visual Leakage in Multimodal Safety

Paper • 2411.19939 • Published 19 days ago • 9
AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information?

Paper • 2412.02611 • Published 15 days ago • 22
U-MATH: A University-Level Benchmark for Evaluating Mathematical Skills in LLMs

Paper • 2412.03205 • Published 14 days ago • 14

MS MARCO Web Search: a Large-scale Information-rich Web Dataset with Millions of Real Click Labels

Paper • 2405.07526 • Published May 13 • 18
Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach

Paper • 2405.15613 • Published May 24 • 13
A Touch, Vision, and Language Dataset for Multimodal Alignment

Paper • 2402.13232 • Published Feb 20 • 13
How Do Large Language Models Acquire Factual Knowledge During Pretraining?

Paper • 2406.11813 • Published Jun 17 • 30

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs