Kyusong Lee

kyusonglee

AI & ML interests

None yet

Recent Activity

liked a Space 3 days ago

omlab/VLM-R1-Referral-Expression

reacted to tianchez's post with 🚀 7 days ago

Introducing VLM-R1! GRPO has helped DeepSeek R1 to learn reasoning. Can it also help VLMs perform stronger for general computer vision tasks? The answer is YES and it generalizes better than SFT. We trained Qwen 2.5 VL 3B on RefCOCO (a visual grounding task) and eval on RefCOCO Val and RefGTA (an OOD task). https://github.com/om-ai-lab/VLM-R1

authored a paper about 1 month ago

OmAgent: A Multi-modal Agent Framework for Complex Video Understanding with Task Divide-and-Conquer

View all activity

Organizations

kyusonglee's activity

liked a Space 3 days ago

VLM R1 Referral Expression

💬

Highlight described objects in images

reacted to tianchez's post with 🚀 7 days ago

Post

3845

Introducing VLM-R1!

GRPO has helped DeepSeek R1 to learn reasoning. Can it also help VLMs perform stronger for general computer vision tasks?

The answer is YES and it generalizes better than SFT. We trained Qwen 2.5 VL 3B on RefCOCO (a visual grounding task) and eval on RefCOCO Val and RefGTA (an OOD task).

https://github.com/om-ai-lab/VLM-R1