Barry Li

Brilliant-B

Brilliant-B

AI & ML interests

None yet

Recent Activity

upvoted a paper 3 days ago

VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation

upvoted a paper 4 days ago

Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey

upvoted a paper 8 days ago

Explanatory Instructions: Towards Unified Vision Tasks Understanding and Zero-shot Generalization

View all activity

Organizations

None yet

Brilliant-B's activity

upvoted a paper 3 days ago

VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation

Paper • 2412.21059 • Published 10 days ago • 17

upvoted a paper 4 days ago

Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey

Paper • 2412.18619 • Published 25 days ago • 49

upvoted 3 papers 8 days ago

Explanatory Instructions: Towards Unified Vision Tasks Understanding and Zero-shot Generalization

Paper • 2412.18525 • Published 16 days ago • 65

OneKE: A Dockerized Schema-Guided LLM Agent-based Knowledge Extraction System

Paper • 2412.20005 • Published 13 days ago • 17

Slow Perception: Let's Perceive Geometric Figures Step-by-step

Paper • 2412.20631 • Published 11 days ago • 13

upvoted a paper 10 days ago

Video-Panda: Parameter-efficient Alignment for Encoder-free Video-Language Models

Paper • 2412.18609 • Published 16 days ago • 15

upvoted a paper 14 days ago

Progressive Multimodal Reasoning via Active Retrieval

Paper • 2412.14835 • Published 21 days ago • 71

upvoted a paper 2 months ago

Physics in Next-token Prediction

Paper • 2411.00660 • Published Nov 1, 2024 • 14

upvoted 3 papers 3 months ago

TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models

Paper • 2410.10818 • Published Oct 14, 2024 • 15

MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning

Paper • 2409.20566 • Published Sep 30, 2024 • 55

Hyper-Connections

Paper • 2409.19606 • Published Sep 29, 2024 • 21

upvoted 2 papers 5 months ago

Chameleon: Mixed-Modal Early-Fusion Foundation Models

Paper • 2405.09818 • Published May 16, 2024 • 127

Do Vision and Language Models Share Concepts? A Vector Space Alignment Study

Paper • 2302.06555 • Published Feb 13, 2023 • 9

upvoted a collection 6 months ago

VILA: On Pre-training for Visual Language Models

Collection

10 items • Updated Oct 31, 2024 • 48

upvoted a paper 6 months ago

Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction

Paper • 2404.02905 • Published Apr 3, 2024 • 65

upvoted an article 6 months ago

Article

Unlocking Longer Generation with Key-Value Cache Quantization

May 16, 2024

• 37

upvoted 4 papers 12 months ago

Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

Paper • 2401.09417 • Published Jan 17, 2024 • 59