video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model Paper • 2502.11775 • Published 13 days ago • 8
Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens Paper • 2410.13863 • Published Oct 17, 2024 • 37
Llama 3.2 Collection This collection hosts the transformers and original repos of the Llama 3.2 and Llama Guard 3 • 15 items • Updated Dec 6, 2024 • 573
MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans? Paper • 2408.13257 • Published Aug 23, 2024 • 26
Photorealistic Object Insertion with Diffusion-Guided Inverse Rendering Paper • 2408.09702 • Published Aug 19, 2024 • 11
Running on CPU Upgrade 1.89k 1.89k Stable Diffusion XL on TPUv5e 🏋 Generate images from text prompts with various styles
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models Paper • 2405.15574 • Published May 24, 2024 • 55