arxiv:2501.05037
Yuxuan Wang PRO
ColorfulAI
AI & ML interests
Multimodal Learning
Recent Activity
authored
a paper
5 days ago
VSTAR: A Video-grounded Dialogue Dataset for Situated Semantic
Understanding with Scene and Topic Transitions
authored
a paper
5 days ago
Collaborative Reasoning on Multi-Modal Semantic Graphs for
Video-Grounded Dialogue Generation
authored
a paper
5 days ago
LSTP: Language-guided Spatial-Temporal Prompt Learning for Long-form
Video-Text Understanding