DINOv2: Learning Robust Visual Features without Supervision Paper • 2304.07193 • Published Apr 14, 2023 • 5
E-ViLM: Efficient Video-Language Model via Masked Video Modeling with Semantic Vector-Quantized Tokenizer Paper • 2311.17267 • Published Nov 28, 2023 • 1
A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions Paper • 2312.08578 • Published Dec 14, 2023 • 16
Alexa, play with robot: Introducing the First Alexa Prize SimBot Challenge on Embodied AI Paper • 2308.05221 • Published Aug 9, 2023 • 9
Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning Paper • 2309.02591 • Published Sep 5, 2023 • 14
Text Quality-Based Pruning for Efficient Training of Language Models Paper • 2405.01582 • Published Apr 26, 2024
The Brittleness of AI-Generated Image Watermarking Techniques: Examining Their Robustness Against Visual Paraphrasing Attacks Paper • 2408.10446 • Published Aug 19, 2024 • 6
DPO Kernels: A Semantically-Aware, Kernel-Enhanced, and Divergence-Rich Paradigm for Direct Preference Optimization Paper • 2501.03271 • Published 6 days ago • 7
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM Paper • 2403.07816 • Published Mar 12, 2024 • 39