CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models Paper • 2411.18613 • Published 2 days ago • 33
AnchorCrafter: Animate CyberAnchors Saling Your Products via Human-Object Interacting Video Generation Paper • 2411.17383 • Published 3 days ago • 4
Style-Friendly SNR Sampler for Style-Driven Generation Paper • 2411.14793 • Published 7 days ago • 35
Material Anything: Generating Materials for Any 3D Object via Diffusion Paper • 2411.15138 • Published 7 days ago • 40
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization Paper • 2411.10442 • Published 14 days ago • 61
GaussianAnything: Interactive Point Cloud Latent Diffusion for 3D Generation Paper • 2411.08033 • Published 17 days ago • 21
AnimateAnything: Consistent and Controllable Animation for Video Generation Paper • 2411.10836 • Published 13 days ago • 20
JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation Paper • 2411.07975 • Published 17 days ago • 26
CAD-MLLM: Unifying Multimodality-Conditioned CAD Generation With MLLM Paper • 2411.04954 • Published 22 days ago • 8
Add-it: Training-Free Object Insertion in Images With Pretrained Diffusion Models Paper • 2411.07232 • Published 18 days ago • 62
Edify Image: High-Quality Image Generation with Pixel Space Laplacian Diffusion Models Paper • 2411.07126 • Published 18 days ago • 28
MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models Paper • 2410.10139 • Published Oct 14 • 50
Guide-and-Rescale: Self-Guidance Mechanism for Effective Tuning-Free Real Image Editing Paper • 2409.01322 • Published Sep 2 • 94
TWLV-I: Analysis and Insights from Holistic Evaluation on Video Foundation Models Paper • 2408.11318 • Published Aug 21 • 54
view article Article ZebraLogic: Benchmarking the Logical Reasoning Ability of Language Models By yuchenlin • Jul 27 • 24