Xin Li's picture

Xin Li PRO

lixin4ever

·

https://lixin4ever.github.io/

lixin4ever

AI & ML interests

Natural Language Processing, Machine Learning

Recent Activity

liked a model 5 days ago

OpenGVLab/InternVL2_5-38B

upvoted a paper 5 days ago

Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking

upvoted a collection 5 days ago

View all activity

Organizations

lixin4ever's activity

upvoted a paper 5 days ago

Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking

Paper • 2502.02339 • Published 13 days ago • 21

upvoted a collection 5 days ago

Ovis2

Our latest advancement in multi-modal large language models (MLLMs) • 8 items • Updated about 13 hours ago • 37

upvoted a paper 10 days ago

Kimi k1.5: Scaling Reinforcement Learning with LLMs

Paper • 2501.12599 • Published 27 days ago • 93

upvoted a collection 18 days ago

🖼️ 2025 MLLMs

6 items • Updated 22 days ago • 1

upvoted a collection 26 days ago

VideoLLaMA3

Frontier Multimodal Foundation Models for Video Understanding • 14 items • Updated 10 days ago • 11

upvoted 2 papers 26 days ago

VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding

Paper • 2501.13106 • Published 26 days ago • 83

MMVU: Measuring Expert-Level Multi-Discipline Video Understanding

Paper • 2501.12380 • Published 27 days ago • 82

upvoted a paper about 1 month ago

REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models

Paper • 2501.03262 • Published Jan 4 • 90

upvoted 2 papers about 2 months ago

2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

Paper • 2501.00958 • Published Jan 1 • 99

VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM

Paper • 2501.00599 • Published Dec 31, 2024 • 41

upvoted a collection 2 months ago

PixMo

A set of vision-language datasets built by Ai2 and used to train the Molmo family of models. Read more at https://molmo.allenai.org/blog • 9 items • Updated 7 days ago • 59

upvoted 2 collections 3 months ago

Inf-CL

The corresponding demos/checkpoints/papers/datasets of Inf-CL. • 2 items • Updated 24 days ago • 3

OpenCoder Datasets

OpenCoder datasets! • 6 items • Updated Nov 15, 2024 • 39

upvoted a paper 3 months ago

TOMATO: Assessing Visual Temporal Reasoning Capabilities in Multimodal Foundation Models

Paper • 2410.23266 • Published Oct 30, 2024 • 20

upvoted 5 papers 4 months ago

Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss

Paper • 2410.17243 • Published Oct 22, 2024 • 89

Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective

Paper • 2410.12490 • Published Oct 16, 2024 • 8

Movie Gen: A Cast of Media Foundation Models

Paper • 2410.13720 • Published Oct 17, 2024 • 93

The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio

Paper • 2410.12787 • Published Oct 16, 2024 • 31

Differential Transformer

Paper • 2410.05258 • Published Oct 7, 2024 • 171

upvoted a paper 5 months ago

Video Instruction Tuning With Synthetic Data

Paper • 2410.02713 • Published Oct 3, 2024 • 38