CrossWordBench: Evaluating the Reasoning Capabilities of LLMs and LVLMs with Controllable Puzzle Generation Paper • 2504.00043 • Published Mar 30, 2025 • 10
SE-DiCoW: Self-Enrolled Diarization-Conditioned Whisper Paper • 2601.19194 • Published 7 days ago • 3
OmegaUse: Building a General-Purpose GUI Agent for Autonomous Task Execution Paper • 2601.20380 • Published 5 days ago • 8
Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer Paper • 2511.22699 • Published Nov 27, 2025 • 235
Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning Paper • 2601.16163 • Published 11 days ago • 13
CamCloneMaster: Enabling Reference-based Camera Control for Video Generation Paper • 2506.03140 • Published Jun 3, 2025 • 1
Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance Paper • 2512.08765 • Published Dec 9, 2025 • 132
NeoVerse: Enhancing 4D World Model with in-the-wild Monocular Videos Paper • 2601.00393 • Published Jan 1 • 130
DreamID-V:Bridging the Image-to-Video Gap for High-Fidelity Face Swapping via Diffusion Transformer Paper • 2601.01425 • Published 29 days ago • 52
LTX-2: Efficient Joint Audio-Visual Foundation Model Paper • 2601.03233 • Published 27 days ago • 145
Papers Collection Large Language Model (LLM) and NLP related papers. • 342 items • Updated 4 days ago • 13
MiniMax-01: Scaling Foundation Models with Lightning Attention Paper • 2501.08313 • Published Jan 14, 2025 • 300