Yuanxing Zhang's picture

24 2

Yuanxing Zhang

LongoXC

·

AI & ML interests

None yet

Recent Activity

upvoted a paper 10 days ago

CoF-T2I: Video Models as Pure Visual Reasoners for Text-to-Image Generation

upvoted a paper 20 days ago

GARDO: Reinforcing Diffusion Models without Reward Hacking

upvoted a paper 27 days ago

GRAN-TED: Generating Robust, Aligned, and Nuanced Text Embedding for Diffusion Models

View all activity

Organizations

None yet

upvoted a paper 10 days ago

CoF-T2I: Video Models as Pure Visual Reasoners for Text-to-Image Generation

Paper • 2601.10061 • Published 11 days ago • 30

upvoted a paper 20 days ago

GARDO: Reinforcing Diffusion Models without Reward Hacking

Paper • 2512.24138 • Published 27 days ago • 29

upvoted a paper 27 days ago

GRAN-TED: Generating Robust, Aligned, and Nuanced Text Embedding for Diffusion Models

Paper • 2512.15560 • Published Dec 17, 2025 • 25

upvoted 4 papers about 1 month ago

T2AV-Compass: Towards Unified Evaluation for Text-to-Audio-Video Generation

Paper • 2512.21094 • Published Dec 24, 2025 • 25

SemanticGen: Video Generation in Semantic Space

Paper • 2512.20619 • Published Dec 23, 2025 • 92

Kling-Omni Technical Report

Paper • 2512.16776 • Published Dec 18, 2025 • 168

Scone: Bridging Composition and Distinction in Subject-Driven Image Generation via Unified Understanding-Generation Modeling

Paper • 2512.12675 • Published Dec 14, 2025 • 41

upvoted a paper 2 months ago

Monet: Reasoning in Latent Visual Space Beyond Images and Language

Paper • 2511.21395 • Published Nov 26, 2025 • 17

upvoted 7 papers 3 months ago

MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs

Paper • 2511.07250 • Published Nov 10, 2025 • 18

MT-Video-Bench: A Holistic Video Understanding Benchmark for Evaluating Multimodal LLMs in Multi-Turn Dialogues

Paper • 2510.17722 • Published Oct 20, 2025 • 20

IF-VidCap: Can Video Caption Models Follow Instructions?

Paper • 2510.18726 • Published Oct 21, 2025 • 26

VR-Thinker: Boosting Video Reward Models through Thinking-with-Image Reasoning

Paper • 2510.10518 • Published Oct 12, 2025 • 19

ReLook: Vision-Grounded RL with a Multimodal LLM Critic for Agentic Web Coding

Paper • 2510.11498 • Published Oct 13, 2025 • 11

OmniVideoBench: Towards Audio-Visual Understanding Evaluation for Omni MLLMs

Paper • 2510.10689 • Published Oct 12, 2025 • 47

AVoCaDO: An Audiovisual Video Captioner Driven by Temporal Orchestration

Paper • 2510.10395 • Published Oct 12, 2025 • 31

upvoted 3 papers 4 months ago

VideoScore2: Think before You Score in Generative Video Evaluation

Paper • 2509.22799 • Published Sep 26, 2025 • 26

OpenGPT-4o-Image: A Comprehensive Dataset for Advanced Image Generation and Editing

Paper • 2509.24900 • Published Sep 29, 2025 • 53

RealUnify: Do Unified Models Truly Benefit from Unification? A Comprehensive Benchmark

Paper • 2509.24897 • Published Sep 29, 2025 • 46

upvoted a paper 7 months ago

ArtifactsBench: Bridging the Visual-Interactive Gap in LLM Code Generation Evaluation

Paper • 2507.04952 • Published Jul 7, 2025 • 11

upvoted a paper 10 months ago

A Comprehensive Survey on Long Context Language Modeling

Paper • 2503.17407 • Published Mar 20, 2025 • 49