arxiv:2504.00557
Ki-Ung song
sk851
AI & ML interests
Generative model / Multimodal
Recent Activity
upvoted
a
paper
17 days ago
D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to
Embodied AI
upvoted
a
paper
5 months ago
Seeing Voices: Generating A-Roll Video from Audio with Mirage
authored
a paper
7 months ago
Efficient LLaMA-3.2-Vision by Trimming Cross-attended Visual Features