One Vision-Language-Action Model for GUI Agent
Qinghong (Kevin) Lin
KevinQHLin
AI & ML interests
Vision-Language Model, Video Understanding, Human-AI Interaction
Recent Activity
upvoted
a
paper
5 days ago
See the Text: From Tokenization to Visual Reading
liked
a model
8 days ago
deepseek-ai/DeepSeek-OCR
upvoted
a
paper
14 days ago
Diffusion Transformers with Representation Autoencoders