5 30 4

Zhengyuan Yang

zyang39

https://zhengyuan.info/

AI & ML interests

None yet

Recent Activity

upvoted a paper about 1 month ago

RAGEN-2: Reasoning Collapse in Agentic RL

authored a paper 6 months ago

Computer-Use Agents as Judges for Generative User Interface

upvoted a paper 6 months ago

Computer-Use Agents as Judges for Generative User Interface

View all activity

Organizations

upvoted a paper about 1 month ago

RAGEN-2: Reasoning Collapse in Agentic RL

Paper • 2604.06268 • Published Apr 7 • 66

authored a paper 6 months ago

Computer-Use Agents as Judges for Generative User Interface

Paper • 2511.15567 • Published Nov 19, 2025 • 54

upvoted a paper 6 months ago

Computer-Use Agents as Judges for Generative User Interface

Paper • 2511.15567 • Published Nov 19, 2025 • 54

upvoted 2 papers 7 months ago

SHANKS: Simultaneous Hearing and Thinking for Spoken Language Models

Paper • 2510.06917 • Published Oct 8, 2025 • 35

EdiVal-Agent: An Object-Centric Framework for Automated, Scalable, Fine-Grained Evaluation of Multi-Turn Editing

Paper • 2509.13399 • Published Sep 16, 2025 • 5

authored 15 papers 7 months ago

Diagnostic Benchmark and Iterative Inpainting for Layout-Guided Image Generation

Paper • 2304.06671 • Published Apr 13, 2023

The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision)

Paper • 2309.17421 • Published Sep 29, 2023 • 4

OpenLEAF: Open-Domain Interleaved Image-Text Generation and Evaluation

Paper • 2310.07749 • Published Oct 11, 2023 • 5

GPT-4V(ision) as A Social Media Analysis Engine

Paper • 2311.07547 • Published Nov 13, 2023 • 1

MultiSum: A Dataset for Multimodal Summarization and Thumbnail Generation of Videos

Paper • 2306.04216 • Published Jun 7, 2023

InfoVisDial: An Informative Visual Dialogue Dataset by Bridging Large Multimodal and Language Models

Paper • 2312.13503 • Published Dec 21, 2023

COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training

Paper • 2401.00849 • Published Jan 1, 2024 • 17

Bring Metric Functions into Diffusion Models

Paper • 2401.02414 • Published Jan 4, 2024

DisCo: Disentangled Control for Referring Human Dance Generation in Real World

Paper • 2307.00040 • Published Jun 30, 2023 • 26

An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA

Paper • 2109.05014 • Published Sep 10, 2021 • 1

UniTAB: Unifying Text and Box Outputs for Grounded Vision-Language Modeling

Paper • 2111.12085 • Published Nov 23, 2021

GIT: A Generative Image-to-text Transformer for Vision and Language

Paper • 2205.14100 • Published May 27, 2022 • 2

Zhengyuan Yang

AI & ML interests

Recent Activity

Organizations

zyang39's activity