Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference Paper • 2508.02193 • Published Aug 4, 2025 • 137
Omni-Embed-Nemotron: A Unified Multimodal Retrieval Model for Text, Image, Audio, and Video Paper • 2510.03458 • Published Oct 3, 2025 • 3
LongCat-Audio-Codec: An Audio Tokenizer and Detokenizer Solution Designed for Speech Large Language Models Paper • 2510.15227 • Published Oct 17, 2025 • 2
DreamID-Omni: Unified Framework for Controllable Human-Centric Audio-Video Generation Paper • 2602.12160 • Published 28 days ago • 38
Adam Improves Muon: Adaptive Moment Estimation with Orthogonalized Momentum Paper • 2602.17080 • Published 21 days ago • 3
Can Training Dynamics of Scale-Invariant Neural Networks Be Explained by the Thermodynamics of an Ideal Gas? Paper • 2511.07308 • Published Nov 10, 2025 • 1
MOSS-Audio-Tokenizer: Scaling Audio Tokenizers for Future Audio Foundation Models Paper • 2602.10934 • Published 29 days ago • 49
InstructTTSEval: Benchmarking Complex Natural-Language Instruction Following in Text-to-Speech Systems Paper • 2506.16381 • Published Jun 19, 2025 • 4
Music Flamingo: Scaling Music Understanding in Audio Language Models Paper • 2511.10289 • Published Nov 13, 2025 • 18
Entropy-Adaptive Fine-Tuning: Resolving Confident Conflicts to Mitigate Forgetting Paper • 2601.02151 • Published Jan 5 • 112
CosyEdit: Unlocking End-to-End Speech Editing Capability from Zero-Shot Text-to-Speech Models Paper • 2601.05329 • Published Jan 8 • 1
Avatar Forcing: Real-Time Interactive Head Avatar Generation for Natural Conversation Paper • 2601.00664 • Published Jan 2 • 57
Knot Forcing: Taming Autoregressive Video Diffusion Models for Real-time Infinite Interactive Portrait Animation Paper • 2512.21734 • Published Dec 25, 2025 • 5
LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation Paper • 2512.23576 • Published Dec 29, 2025 • 65