13 11 37

Ligeng Zhu

Ligeng-Zhu

AI & ML interests

None yet

Recent Activity

liked a dataset 27 days ago

lmms-lab/LLaVA-558K-Webdataset

updated a model about 1 month ago

Ligeng-Zhu/data.zip

published a model about 1 month ago

Ligeng-Zhu/data.zip

View all activity

Organizations

upvoted a collection 2 months ago

InternVL3.5

Collection

This collection includes all released checkpoints of InternVL3.5, covering different training stages (e.g., Pretraining, SFT, MPO, Cascade RL). • 54 items • Updated Sep 28, 2025 • 104

upvoted an article 7 months ago

Article

The Common Pile v0.1

Jun 6, 2025

•

upvoted an article 10 months ago

Article

Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM

Mar 12, 2025

•

480

upvoted a collection about 1 year ago

Sana

Collection

⚡️Sana: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer • 21 items • Updated Sep 13, 2025 • 98

upvoted 2 papers over 1 year ago

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models

Paper • 2409.17146 • Published Sep 25, 2024 • 121

Wolf: Captioning Everything with a World Summarization Framework

Paper • 2407.18908 • Published Jul 26, 2024 • 32

upvoted a collection over 1 year ago

VILA: On Pre-training for Visual Language Models

Collection

10 items • Updated Sep 13, 2025 • 57

upvoted a paper over 1 year ago

A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions

Paper • 2312.08578 • Published Dec 14, 2023 • 20

upvoted 3 papers about 2 years ago

VILA: On Pre-training for Visual Language Models

Paper • 2312.07533 • Published Dec 12, 2023 • 21

Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference

Paper • 2310.04378 • Published Oct 6, 2023 • 22

PockEngine: Sparse and Efficient Fine-tuning in a Pocket

Paper • 2310.17752 • Published Oct 26, 2023 • 15

Ligeng Zhu

AI & ML interests

Recent Activity

Organizations

Ligeng-Zhu's activity

The Common Pile v0.1

Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM