2 6 5

Conghui He

conghui

AI & ML interests

None yet

Recent Activity

authored a paper 18 days ago

OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?

authored a paper about 1 month ago

GeoX: Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training

authored a paper about 2 months ago

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

View all activity

Organizations

None yet

conghui's activity

authored a paper 18 days ago

OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?

Paper • 2501.05510 • Published 21 days ago • 39

authored a paper about 1 month ago

GeoX: Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training

Paper • 2412.11863 • Published Dec 16, 2024 • 4

authored 3 papers about 2 months ago

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

Paper • 2412.09596 • Published Dec 12, 2024 • 93

Chimera: Improving Generalist Model with Domain-Specific Experts

Paper • 2412.05983 • Published Dec 8, 2024 • 9

OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations

Paper • 2412.07626 • Published Dec 10, 2024 • 22

upvoted a paper about 2 months ago

OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations

Paper • 2412.07626 • Published Dec 10, 2024 • 22

authored 2 papers about 2 months ago

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

Paper • 2412.05271 • Published Dec 6, 2024 • 129

OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation

Paper • 2412.02592 • Published Dec 3, 2024 • 22

liked a dataset about 2 months ago

opendatalab/OmniDocBench

Viewer • Updated Dec 25, 2024 • 984 • 1.88k • 19

liked a Space 2 months ago

Running on L4

217

📚

MinerU

authored 2 papers 3 months ago

MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models

Paper • 2410.17637 • Published Oct 23, 2024 • 34

PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction

Paper • 2410.17247 • Published Oct 22, 2024 • 45

upvoted 2 papers 4 months ago

DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception

Paper • 2410.12628 • Published Oct 16, 2024 • 30

LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models

Paper • 2410.09732 • Published Oct 13, 2024 • 54

authored 2 papers 4 months ago

LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models

Paper • 2410.09732 • Published Oct 13, 2024 • 54

Multi-Agent Collaborative Data Selection for Efficient LLM Pretraining

Paper • 2410.08102 • Published Oct 10, 2024 • 20

upvoted a paper 4 months ago

Multi-Agent Collaborative Data Selection for Efficient LLM Pretraining

Paper • 2410.08102 • Published Oct 10, 2024 • 20

authored a paper 4 months ago

MinerU: An Open-Source Solution for Precise Document Content Extraction

Paper • 2409.18839 • Published Sep 27, 2024 • 27

authored a paper 5 months ago

CDM: A Reliable Metric for Fair and Accurate Formula Recognition Evaluation

Paper • 2409.03643 • Published Sep 5, 2024 • 19

upvoted a paper 5 months ago

CDM: A Reliable Metric for Fair and Accurate Formula Recognition Evaluation

Paper • 2409.03643 • Published Sep 5, 2024 • 19