29 71 9

Byung-Kwan Lee

BK-Lee

https://sites.google.com/view/byungkwanlee

AI & ML interests

Computer Vision, Machine Learning, Large Language and Vision Models, Efficient Modeling

Recent Activity

upvoted a paper about 3 hours ago

Are Your LLMs Capable of Stable Reasoning?

upvoted a paper 5 days ago

Phi-4 Technical Report

upvoted a paper 5 days ago

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

View all activity

Organizations

BK-Lee's activity

upvoted a paper about 3 hours ago

Are Your LLMs Capable of Stable Reasoning?

Paper • 2412.13147 • Published about 13 hours ago • 38

upvoted 2 papers 5 days ago

Phi-4 Technical Report

Paper • 2412.08905 • Published 6 days ago • 82

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

Paper • 2412.09596 • Published 5 days ago • 87

upvoted 4 papers 9 days ago

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

Paper • 2412.05271 • Published 11 days ago • 110

upvoted a paper 15 days ago

VLsI: Verbalized Layers-to-Interactions from Large to Small Vision Language Models

Paper • 2412.01822 • Published 15 days ago • 14

upvoted a paper 30 days ago

LLaVA-o1: Let Vision Language Models Reason Step-by-Step

Paper • 2411.10440 • Published Nov 15 • 109

upvoted a paper about 2 months ago

FlatQuant: Flatness Matters for LLM Quantization

Paper • 2410.09426 • Published Oct 12 • 12

upvoted 3 papers 2 months ago

Pixtral 12B

Paper • 2410.07073 • Published Oct 9 • 60

Intriguing Properties of Large Language and Vision Models

Paper • 2410.04751 • Published Oct 7 • 16

MM-Ego: Towards Building Egocentric Multimodal LLMs

Paper • 2410.07177 • Published Oct 9 • 20

upvoted 7 papers 3 months ago

VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models

Paper • 2409.17066 • Published Sep 25 • 27

MIO: A Foundation Model on Multimodal Tokens

Paper • 2409.17692 • Published Sep 26 • 52

Emu3: Next-Token Prediction is All You Need

Paper • 2409.18869 • Published Sep 27 • 92

MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models

Paper • 2409.17481 • Published Sep 26 • 46

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models

Paper • 2409.17146 • Published Sep 25 • 103

MaskBit: Embedding-free Image Generation via Bit Tokens

Paper • 2409.16211 • Published Sep 24 • 16

Phantom of Latent for Large Language and Vision Models

Paper • 2409.14713 • Published Sep 23 • 27