12 45 80

Weiyun Wang

Weiyun1025

Weiyun1025

AI & ML interests

None yet

Recent Activity

upvoted a paper 1 day ago

SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding

commented a paper 4 days ago

VisionArena: 230K Real World User-VLM Conversations with Preference Labels

upvoted a paper 4 days ago

VisionArena: 230K Real World User-VLM Conversations with Preference Labels

View all activity

Organizations

Weiyun1025's activity

upvoted a paper 1 day ago

SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding

Paper • 2412.09604 • Published 5 days ago • 35

upvoted a paper 4 days ago

VisionArena: 230K Real World User-VLM Conversations with Preference Labels

Paper • 2412.08687 • Published 6 days ago • 11

upvoted a paper 5 days ago

Phi-4 Technical Report

Paper • 2412.08905 • Published 6 days ago • 82

upvoted 3 papers 7 days ago

upvoted a paper 9 days ago

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

Paper • 2412.05271 • Published 11 days ago • 110

upvoted a paper 12 days ago

Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction

Paper • 2412.04454 • Published 12 days ago • 45

upvoted a paper 26 days ago

Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization

Paper • 2411.10442 • Published Nov 15 • 61

upvoted a collection 27 days ago

InternVL 2.5

Collection

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling • 18 items • Updated about 16 hours ago • 67

upvoted a collection 28 days ago

InternVL Data

Collection

7 items • Updated 6 days ago • 4

upvoted 2 papers about 1 month ago

VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos

Paper • 2411.04923 • Published Nov 7 • 20

OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models

Paper • 2411.04905 • Published Nov 7 • 110

upvoted 5 papers 2 months ago

Rethinking Data Selection at Scale: Random Selection is Almost All You Need

Paper • 2410.09335 • Published Oct 12 • 16

Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models

Paper • 2410.07985 • Published Oct 10 • 27

MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks

Paper • 2410.10563 • Published Oct 14 • 38

MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models

Paper • 2410.10139 • Published Oct 14 • 51

LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models

Paper • 2410.09732 • Published Oct 13 • 54

upvoted 2 papers 3 months ago

MinerU: An Open-Source Solution for Precise Document Content Extraction

Paper • 2409.18839 • Published Sep 27 • 26

Emu3: Next-Token Prediction is All You Need

Paper • 2409.18869 • Published Sep 27 • 92