OS-ATLAS: A Foundation Action Model for Generalist GUI Agents Paper • 2410.23218 • Published 10 days ago • 43
Unbounded: A Generative Infinite Game of Character Life Simulation Paper • 2410.18975 • Published 16 days ago • 34
WorldSimBench: Towards Video Generation Models as World Simulators Paper • 2410.18072 • Published 17 days ago • 16
SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree Paper • 2410.16268 • Published 19 days ago • 65
Alchemy: Amplifying Theorem-Proving Capability through Symbolic Mutation Paper • 2410.15748 • Published 20 days ago • 12
DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking Head Video Generation Paper • 2410.13726 • Published 23 days ago • 10
Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation Paper • 2410.13232 • Published 24 days ago • 40
DreamVideo-2: Zero-Shot Subject-Driven Video Customization with Precise Motion Control Paper • 2410.13830 • Published 23 days ago • 23
GS^3: Efficient Relighting with Triple Gaussian Splatting Paper • 2410.11419 • Published 26 days ago • 10
MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical Code Paper • 2410.08196 • Published about 1 month ago • 44
MLLM as Retriever: Interactively Learning Multimodal Retrieval for Embodied Agents Paper • 2410.03450 • Published Oct 4 • 32
Pyramidal Flow Matching for Efficient Video Generative Modeling Paper • 2410.05954 • Published Oct 8 • 37
Diffusion360: Seamless 360 Degree Panoramic Image Generation based on Diffusion Models Paper • 2311.13141 • Published Nov 22, 2023 • 13
LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes Paper • 2311.13384 • Published Nov 22, 2023 • 50
Aria: An Open Multimodal Native Mixture-of-Experts Model Paper • 2410.05993 • Published Oct 8 • 107
view article Article ColPali: Efficient Document Retrieval with Vision Language Models 👀 By manu • Jul 5 • 153
Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation Paper • 2410.05363 • Published Oct 7 • 44