Zhaokai Wang's picture

Zhaokai Wang

wzk1015

·

https://www.wzk.plus

wzk1015

AI & ML interests

Computer Vision Music Generation Multimodal Large Language Models

Recent Activity

liked a model 6 days ago

OpenGVLab/InternVL3_5-241B-A28B-HF

liked a model 10 days ago

OpenGVLab/InternVL3_5-GPT-OSS-20B-A4B-Preview

authored a paper 10 days ago

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

View all activity

Organizations

upvoted a paper 10 days ago

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Paper • 2508.18265 • Published 10 days ago • 177

upvoted a paper 28 days ago

Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation

Paper • 2508.05635 • Published 28 days ago • 72

upvoted a paper about 2 months ago

Mono-InternVL-1.5: Towards Cheaper and Faster Monolithic Multimodal Large Language Models

Paper • 2507.12566 • Published Jul 16 • 14

upvoted a paper 3 months ago

Large Language Models for Data Synthesis

Paper • 2505.14752 • Published May 20 • 50

upvoted 2 papers 4 months ago

EWMBench: Evaluating Scene, Motion, and Semantic Quality in Embodied World Models

Paper • 2505.09694 • Published May 14 • 19

EnerVerse-AC: Envisioning Embodied Environments with Action Condition

Paper • 2505.09723 • Published May 14 • 23

upvoted a collection 4 months ago

Qwen2.5

Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. • 46 items • Updated Jul 21 • 638

upvoted a paper 5 months ago

Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

Paper • 2504.13837 • Published Apr 18 • 134

upvoted a collection 5 months ago

PIIP

[NeurIPS 2024 Spotlight (Ranking Top 10), TPAMI 2025] Parameter-Inverted Image Pyramid Networks • 11 items • Updated Aug 2 • 1

upvoted a paper 5 months ago

Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing

Paper • 2504.02826 • Published Apr 3 • 69

upvoted 2 papers 6 months ago

Adversarial Data Collection: Human-Collaborative Perturbations for Efficient and Robust Robotic Imitation Learning

Paper • 2503.11646 • Published Mar 14 • 36

OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference

Paper • 2502.18411 • Published Feb 25 • 75

upvoted a paper 8 months ago

Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal Understanding

Paper • 2501.07783 • Published Jan 14 • 7

upvoted 3 papers 9 months ago

SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding

Paper • 2412.09604 • Published Dec 12, 2024 • 39

Multimodal Music Generation with Explicit Bridges and Retrieval Augmentation

Paper • 2412.09428 • Published Dec 12, 2024 • 7

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

Paper • 2412.05271 • Published Dec 6, 2024 • 160

upvoted a paper 10 months ago

Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training

Paper • 2410.08202 • Published Oct 10, 2024 • 4

upvoted a collection 10 months ago

InternVL2.5

Better than InternVL 2.0 • 19 items • Updated Apr 20 • 92

upvoted a collection 11 months ago

Mono-InternVL

A Pioneering Monolithic MLLM • 8 items • Updated Jul 22 • 6

upvoted a paper about 1 year ago

Model Surgery: Modulating LLM's Behavior Via Simple Parameter Editing

Paper • 2407.08770 • Published Jul 11, 2024 • 21