OpenGPT-4o-Image: A Comprehensive Dataset for Advanced Image Generation and Editing Paper • 2509.24900 • Published Sep 29 • 53
view article Article NVIDIA Releases 3 Million Sample Dataset for OCR, Visual Question Answering, and Captioning Tasks By nvidia and 4 others • Aug 11 • 74
Echo-4o: Harnessing the Power of GPT-4o Synthetic Images for Improved Image Generation Paper • 2508.09987 • Published Aug 13 • 25
view article Article Introducing Command A Vision: Multimodal AI built for Business By CohereLabs and 3 others • Jul 31 • 63
FreeMorph: Tuning-Free Generalized Image Morphing with Diffusion Model Paper • 2507.01953 • Published Jul 2 • 19
BlenderFusion: 3D-Grounded Visual Editing and Generative Compositing Paper • 2506.17450 • Published Jun 20 • 63
LettinGo: Explore User Profile Generation for Recommendation System Paper • 2506.18309 • Published Jun 23 • 11
AniMaker: Automated Multi-Agent Animated Storytelling with MCTS-Driven Clip Generation Paper • 2506.10540 • Published Jun 12 • 37
MiniMax-Speech: Intrinsic Zero-Shot Text-to-Speech with a Learnable Speaker Encoder Paper • 2505.07916 • Published May 12 • 132
view article Article DiffRhythm: Revolutionizing Open Source AI Music Generator By Dzkaka • Mar 5 • 13
VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning Paper • 2504.07960 • Published Apr 10 • 50
VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step Paper • 2504.01956 • Published Apr 2 • 41