Visual CoT: Unleashing Chain-of-Thought Reasoning in Multi-Modal Language Models Paper • 2403.16999 • Published Mar 25, 2024 • 5
ID-Animator: Zero-Shot Identity-Preserving Human Video Generation Paper • 2404.15275 • Published Apr 23, 2024
MAR-3D: Progressive Masked Auto-regressor for High-Resolution 3D Generation Paper • 2503.20519 • Published Mar 26
EmbRACE-3K: Embodied Reasoning and Action in Complex Environments Paper • 2507.10548 • Published Jul 14 • 36
V-ReasonBench: Toward Unified Reasoning Benchmark Suite for Video Generation Models Paper • 2511.16668 • Published 19 days ago • 53
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models Paper • 2309.12307 • Published Sep 21, 2023 • 89