arxiv:2406.18521
Zirui Wang
zwcolin
AI & ML interests
My general research interest lies in two directions (1) understand and harness the synergy between generative and understanding modeling objectives, and (2) align image and text in different modalities, especially when texts (and other arbitrary, non-natural structures such as graphs and flowcharts) appear in the visual representation.
Recent Activity
upvoted
a
paper
25 days ago
Learning Video Representations without Natural Videos
upvoted
a
paper
about 1 month ago
Distill Visual Chart Reasoning Ability from LLMs to MLLMs