Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction Paper • 2501.03218 • Published Jan 6 • 36
Slamming: Training a Speech Language Model on One GPU in a Day Paper • 2502.15814 • Published 15 days ago • 66