4 14 5

Hyogun Lee

Haawron

AI & ML interests

Video understanding, multi-modal LLMs

Recent Activity

upvoted a paper about 2 hours ago

Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding

upvoted a paper about 9 hours ago

Byte Latent Transformer: Patches Scale Better Than Tokens

commented a paper about 9 hours ago

Apollo: An Exploration of Video Understanding in Large Multimodal Models

View all activity

Organizations

None yet

Haawron's activity

upvoted a paper about 2 hours ago

Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding

Paper • 2412.00493 • Published 18 days ago • 16

upvoted a paper about 9 hours ago

Byte Latent Transformer: Patches Scale Better Than Tokens

Paper • 2412.09871 • Published 5 days ago • 48

commented a paper about 9 hours ago

Apollo: An Exploration of Video Understanding in Large Multimodal Models

Paper • 2412.10360 • Published 5 days ago • 119 •

upvoted 3 papers 2 days ago

New activity in lmms-lab/llava-onevision-qwen2-0.5b-si 5 days ago

Training time

#3 opened 5 days ago by

Haawron

upvoted 3 papers 6 days ago

TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation

Paper • 2412.03069 • Published 14 days ago • 30

VisionZip: Longer is Better but Not Necessary in Vision Language Models

Paper • 2412.04467 • Published 13 days ago • 103

EXAONE 3.5: Series of Large Language Models for Real-world Use Cases

Paper • 2412.04862 • Published 12 days ago • 46

liked a model 6 days ago

meta-llama/Llama-3.2-90B-Vision-Instruct

Image-Text-to-Text • Updated 13 days ago • 88.4k • 303

liked a model 15 days ago

Bllossom/llama-3-Korean-Bllossom-70B

Text Generation • Updated 5 days ago • 674 • 78

liked 2 models 3 months ago

stabilityai/stable-diffusion-2-1-base

Text-to-Image • Updated Jul 5, 2023 • 470k • 637

stabilityai/stable-diffusion-2-1

Text-to-Image • Updated Jul 5, 2023 • 1.22M • • 3.91k

commented a paper 3 months ago

YesBut: A High-Quality Annotated Multimodal Dataset for evaluating Satire Comprehension capability of Vision-Language Models

Paper • 2409.13592 • Published Sep 20 • 48 •

commented a paper 7 months ago

Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

Paper • 2405.21060 • Published May 31 • 63 •

upvoted a paper 7 months ago

Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

Paper • 2405.21060 • Published May 31 • 63

upvoted 2 collections 7 months ago

LLaVA-1.5

Collection

A collection of LLaVA-1.5 checkpoints • 4 items • Updated Jan 31 • 18

LLaVA-1.6

Collection

A collection of LLaVA-1.6 checkpoints • 4 items • Updated Jan 31 • 67

upvoted a paper 7 months ago

MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning

Paper • 2405.12130 • Published May 20 • 46