MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models Paper • 2408.02718 • Published Aug 5 • 60
ControlNeXt: Powerful and Efficient Control for Image and Video Generation Paper • 2408.06070 • Published Aug 12 • 52
Transformer Explainer: Interactive Learning of Text-Generative Models Paper • 2408.04619 • Published Aug 8 • 155
MeshFormer: High-Quality Mesh Generation with 3D-Guided Reconstruction Model Paper • 2408.10198 • Published Aug 19 • 32
Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents Paper • 2408.07060 • Published Aug 13 • 40
LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs Paper • 2408.07055 • Published Aug 13 • 65
DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search Paper • 2408.08152 • Published Aug 15 • 52
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models Paper • 2408.08872 • Published Aug 16 • 97
LongVILA: Scaling Long-Context Visual Language Models for Long Videos Paper • 2408.10188 • Published Aug 19 • 51
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery Paper • 2408.06292 • Published Aug 12 • 115