Jaehyun Jun's picture

Jaehyun Jun

btjhjeon

·

https://btjhjeon.github.io/

btjhjeon

AI & ML interests

Multimodal

Organizations

btjhjeon's activity

upvoted a paper 3 days ago

EMMA: End-to-End Multimodal Model for Autonomous Driving

Paper • 2410.23262 • Published 10 days ago • 2

upvoted 3 papers 5 days ago

DynaMath: A Dynamic Visual Benchmark for Evaluating Mathematical Reasoning Robustness of Vision Language Models

Paper • 2411.00836 • Published 11 days ago • 14

TOMATO: Assessing Visual Temporal Reasoning Capabilities in Multimodal Foundation Models

Paper • 2410.23266 • Published 10 days ago • 19

OS-ATLAS: A Foundation Action Model for Generalist GUI Agents

Paper • 2410.23218 • Published 10 days ago • 43

upvoted a paper 9 days ago

BenchX: A Unified Benchmark Framework for Medical Vision-Language Pretraining on Chest X-Rays

Paper • 2410.21969 • Published 12 days ago • 8

upvoted 2 papers 10 days ago

MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark

Paper • 2410.19168 • Published 16 days ago • 19

Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data

Paper • 2410.18558 • Published 17 days ago • 17

upvoted 4 papers 11 days ago

CLEAR: Character Unlearning in Textual and Visual Modalities

Paper • 2410.18057 • Published 17 days ago • 197

GPT-4o System Card

Paper • 2410.21276 • Published 15 days ago • 76

VideoWebArena: Evaluating Long Context Multimodal Agents with Video Understanding Web Tasks

Paper • 2410.19100 • Published 16 days ago • 6

Document Parsing Unveiled: Techniques, Challenges, and Prospects for Structured Information Extraction

Paper • 2410.21169 • Published 12 days ago • 29

upvoted 8 papers 14 days ago

NaturalBench: Evaluating Vision-Language Models on Natural Adversarial Samples

Paper • 2410.14669 • Published 22 days ago • 35

TP-Eval: Tap Multimodal LLMs' Potential in Evaluation by Customizing Prompts

Paper • 2410.18071 • Published 17 days ago • 6

LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding

Paper • 2410.17434 • Published 18 days ago • 24

MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models

Paper • 2410.17637 • Published 18 days ago • 34

ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language Tuning

Paper • 2410.17779 • Published 18 days ago • 7

CAMEL-Bench: A Comprehensive Arabic LMM Benchmark

Paper • 2410.18976 • Published 16 days ago • 8

WAFFLE: Multi-Modal Model for Automated Front-End Development

Paper • 2410.18362 • Published 17 days ago • 11

Distill Visual Chart Reasoning Ability from LLMs to MLLMs

Paper • 2410.18798 • Published 17 days ago • 19

upvoted a paper 18 days ago

Mini-Omni2: Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities

Paper • 2410.11190 • Published 26 days ago • 20