PaliGemma 2: A Family of Versatile VLMs for Transfer Paper • 2412.03555 • Published 14 days ago • 116
PaliGemma 2 Release Collection Vision-Language Models available in multiple 3B, 10B and 28B variants. • 23 items • Updated 5 days ago • 112
Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion Paper • 2412.04424 • Published 13 days ago • 53
CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models Paper • 2411.18613 • Published 21 days ago • 50
microsoft/LLM2CLIP-Llama-3-8B-Instruct-CC-Finetuned Zero-Shot Classification • Updated 29 days ago • 11.9k • 27
A Case Study of Web App Coding with OpenAI Reasoning Models Paper • 2409.13773 • Published Sep 19 • 5
Adaptive Caching for Faster Video Generation with Diffusion Transformers Paper • 2411.02397 • Published Nov 4 • 23