HAODONG DUAN's picture

HAODONG DUAN

KennyUTC

·

https://kennymckormick.github.io

AI & ML interests

Video Understanding; Multi-Modal Learning

Recent Activity

updated a Space 4 days ago

opencompass/open_vlm_leaderboard

updated a dataset 15 days ago

VLMEval/OpenVLMRecords

authored a paper 28 days ago

Articles

Claude-3.5 Evaluation Results on Open VLM Leaderboard

RealWorldQA, What's New?

Organizations

KennyUTC's activity

upvoted a paper 28 days ago

MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models

Paper • 2410.17637 • Published 29 days ago • 34

upvoted a paper 29 days ago

PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction

Paper • 2410.17247 • Published 30 days ago • 43

upvoted 4 papers about 1 month ago

PUMA: Empowering Unified MLLM with Multi-granular Visual Generation

Paper • 2410.13861 • Published Oct 17 • 53

CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and Evolution

Paper • 2410.16256 • Published about 1 month ago • 58

SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree

Paper • 2410.16268 • Published about 1 month ago • 65

ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs

Paper • 2410.12405 • Published Oct 16 • 13

upvoted a collection 2 months ago

VisionLM

473 items • Updated 4 days ago • 31

upvoted a paper 2 months ago

POINTS: Improving Your Vision-language Model with Affordable Strategies

Paper • 2409.04828 • Published Sep 7 • 22

upvoted a collection 3 months ago

VILA: On Pre-training for Visual Language Models

10 items • Updated 22 days ago • 45

upvoted a paper 3 months ago

GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI

Paper • 2408.03361 • Published Aug 6 • 85

upvoted a paper 4 months ago

VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models

Paper • 2407.11691 • Published Jul 16 • 13

upvoted a paper 5 months ago

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

Paper • 2407.03320 • Published Jul 3 • 92

upvoted a collection 5 months ago

InternVL 2.0

Expanding Performance Boundaries of Open-Source MLLM • 16 items • Updated about 24 hours ago • 76

upvoted a paper 5 months ago

MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning

Paper • 2406.17770 • Published Jun 25 • 18

upvoted a collection 5 months ago

AI Paper of the Day

A collection of papers that I think are interesting, one added each day • 222 items • Updated 1 day ago • 28

upvoted an article 5 months ago

Article

Claude-3.5 Evaluation Results on Open VLM Leaderboard

By

•

Jun 24

• 5

upvoted 3 papers 5 months ago

MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding

Paper • 2406.14515 • Published Jun 20 • 32

Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs

Paper • 2406.14544 • Published Jun 20 • 34

MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs

Paper • 2406.11833 • Published Jun 17 • 61

upvoted a paper 6 months ago

ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

Paper • 2406.04325 • Published Jun 6 • 72