3 2 1

Long Zhao

garyzhao9012

https://garyzhao.github.io/

AI & ML interests

Computer Vision, Machine Perception, Machine Learning

Recent Activity

authored a paper 24 days ago

The Hidden Life of Tokens: Reducing Hallucination of Large Vision-Language Models via Visual Information Steering

upvoted a paper about 1 month ago

The Hidden Life of Tokens: Reducing Hallucination of Large Vision-Language Models via Visual Information Steering

upvoted a paper 3 months ago

Video Creation by Demonstration

View all activity

Organizations

garyzhao9012's activity

authored a paper 24 days ago

The Hidden Life of Tokens: Reducing Hallucination of Large Vision-Language Models via Visual Information Steering

Paper • 2502.03628 • Published Feb 5 • 12

upvoted a paper about 1 month ago

The Hidden Life of Tokens: Reducing Hallucination of Large Vision-Language Models via Visual Information Steering

Paper • 2502.03628 • Published Feb 5 • 12

upvoted a paper 3 months ago

Video Creation by Demonstration

Paper • 2412.09551 • Published Dec 12, 2024 • 9

commented a paper 3 months ago

Video Creation by Demonstration

Paper • 2412.09551 • Published Dec 12, 2024 • 9 •

liked a model 5 months ago

CompVis/stable-diffusion-v1-4

Text-to-Image • Updated Aug 23, 2023 • 1.11M • • 6.72k

authored a paper 5 months ago

$ε$-VAE: Denoising as Visual Decoding

Paper • 2410.04081 • Published Oct 5, 2024 • 7

commented a paper 5 months ago

$ε$-VAE: Denoising as Visual Decoding

Paper • 2410.04081 • Published Oct 5, 2024 • 7 •

authored 4 papers 5 months ago

Structured Video-Language Modeling with Temporal Grouping and Spatial Grounding

Paper • 2303.16341 • Published Mar 28, 2023

Nested Hierarchical Transformer: Towards Accurate, Data-Efficient and Interpretable Visual Understanding

Paper • 2105.12723 • Published May 26, 2021

VideoGLUE: Video General Understanding Evaluation of Foundation Models

Paper • 2307.03166 • Published Jul 6, 2023 • 5

Learning from Semantic Alignment between Unpaired Multiviews for Egocentric Video Recognition

Paper • 2308.11489 • Published Aug 22, 2023

authored 2 papers about 1 year ago

VideoPrism: A Foundational Visual Encoder for Video Understanding

Paper • 2402.13217 • Published Feb 20, 2024 • 24

Distilling Vision-Language Models on Millions of Videos

Paper • 2401.06129 • Published Jan 11, 2024 • 17

authored a paper over 1 year ago

Unified Visual Relationship Detection with Vision and Language Models

Paper • 2303.08998 • Published Mar 16, 2023