Autoregressive Video Generation without Vector Quantization Paper • 2412.14169 • Published about 1 month ago • 14
Interleaved Scene Graph for Interleaved Text-and-Image Generation Assessment Paper • 2411.17188 • Published Nov 26, 2024 • 21
deepdml/faster-whisper-large-v3-turbo-ct2 Automatic Speech Recognition • Updated Oct 27, 2024 • 166k • 88
Emu3 Collection Emu3: Next-Token Prediction is All You Need • 7 items • Updated 5 days ago • 68
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters Paper • 2402.04252 • Published Feb 6, 2024 • 25
Generative Multimodal Models are In-Context Learners Paper • 2312.13286 • Published Dec 20, 2023 • 34
CapsFusion: Rethinking Image-Text Data at Scale Paper • 2310.20550 • Published Oct 31, 2023 • 25