19 2

Boyuan Sun

BoyuanSun

AI & ML interests

None yet

Recent Activity

upvoted a paper 5 days ago

See What I Mean: Aligning Vision and Language Representations for Video Fine-grained Object Understanding

upvoted a paper 3 months ago

GeoAgent: Learning to Geolocate Everywhere with Reinforced Geographic Characteristics

upvoted a paper 11 months ago

Depth Anything at Any Condition

View all activity

Organizations

None yet

upvoted a paper 5 days ago

See What I Mean: Aligning Vision and Language Representations for Video Fine-grained Object Understanding

Paper • 2605.18018 • Published 7 days ago • 19

upvoted a paper 3 months ago

GeoAgent: Learning to Geolocate Everywhere with Reinforced Geographic Characteristics

Paper • 2602.12617 • Published Feb 13 • 20

upvoted a paper 11 months ago

Depth Anything at Any Condition

Paper • 2507.01634 • Published Jul 2, 2025 • 49

liked 2 models 11 months ago

BBBBCHAN/LLaVA-Scissor-baseline-0.5B

Video-Text-to-Text • 0.9B • Updated Jul 1, 2025 • 4 • 4

BBBBCHAN/LLaVA-Scissor-baseline-7B

Video-Text-to-Text • 8B • Updated Jul 1, 2025 • 13 • 3

upvoted a paper 11 months ago

LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMs

Paper • 2506.21862 • Published Jun 27, 2025 • 36

upvoted 4 papers about 1 year ago

VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning

Paper • 2504.07960 • Published Apr 10, 2025 • 50

upvoted 10 papers over 1 year ago

Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction

Paper • 2501.03218 • Published Jan 6, 2025 • 35

DepthLab: From Partial to Complete

Paper • 2412.18153 • Published Dec 24, 2024 • 36

Qwen2.5 Technical Report

Paper • 2412.15115 • Published Dec 19, 2024 • 380

Apollo: An Exploration of Video Understanding in Large Multimodal Models

Paper • 2412.10360 • Published Dec 13, 2024 • 147

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

Paper • 2412.09596 • Published Dec 12, 2024 • 97

CorrMatch: Label Propagation via Correlation Matching for Semi-Supervised Semantic Segmentation

Paper • 2306.04300 • Published Jun 7, 2023 • 2

VideoLLM-online: Online Video Large Language Model for Streaming Video

Paper • 2406.11816 • Published Jun 17, 2024 • 26

Depth Anything V2

Paper • 2406.09414 • Published Jun 13, 2024 • 103

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

Paper • 2412.05271 • Published Dec 6, 2024 • 161

VisionZip: Longer is Better but Not Necessary in Vision Language Models

Paper • 2412.04467 • Published Dec 5, 2024 • 118

Boyuan Sun

AI & ML interests

Recent Activity

Organizations

BoyuanSun's activity