Path to Multimodal Generalist

community

https://generalist.top/

path2generalist

AI & ML interests

Multimodal Generalist

Recent Activity

marinero4972 authored a paper 13 days ago

DenseWorld-1M: Towards Detailed Dense Grounded Caption in the Real World

marinero4972 authored a paper 13 days ago

Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence

marinero4972 authored a paper 15 days ago

Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs

View all activity

marinero4972

authored 2 papers 13 days ago

DenseWorld-1M: Towards Detailed Dense Grounded Caption in the Real World

Paper • 2506.24102 • Published Jun 30

Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence

Paper • 2510.20579 • Published 14 days ago • 54

marinero4972

authored a paper 15 days ago

Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs

Paper • 2510.18876 • Published 16 days ago • 35

ChocoWu

updated 2 datasets 3 months ago

General-Level/General-Bench-Closeset

Updated Aug 4 • 1.74k • 2

General-Level/General-Bench-Openset

Updated Aug 4 • 15.2k • 4

LXT

authored 3 papers 4 months ago

Mixed-R1: Unified Reward Perspective For Reasoning Capability in Multimodal Large Language Models

Paper • 2505.24164 • Published May 30

UltraVideo: High-Quality UHD Video Dataset with Comprehensive Captions

Paper • 2506.13691 • Published Jun 16 • 2

Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology

Paper • 2507.07999 • Published Jul 10 • 49

LXT

authored 8 papers 5 months ago

OmniAudio: Generating Spatial Audio from 360-Degree Video

Paper • 2504.14906 • Published Apr 21

Towards Semantic Equivalence of Tokenization in Multimodal LLM

Paper • 2406.05127 • Published Jun 7, 2024

So-Fake: Benchmarking and Explaining Social Media Image Forgery Detection

Paper • 2505.18660 • Published May 24 • 1

PixelThink: Towards Efficient Chain-of-Pixel Reasoning

Paper • 2505.23727 • Published May 29 • 5

Muddit: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model

Paper • 2505.23606 • Published May 29 • 14

Conditional Panoramic Image Generation via Masked Autoregressive Modeling

Paper • 2505.16862 • Published May 22

MERIT: Multilingual Semantic Retrieval with Interleaved Multi-Condition Query

Paper • 2506.03144 • Published Jun 3 • 7

BusterX: MLLM-Powered AI-Generated Video Forgery Detection and Explanation

Paper • 2505.12620 • Published May 19

marinero4972

authored a paper 5 months ago

CyberV: Cybernetics for Test-time Scaling in Video Understanding

Paper • 2506.07971 • Published Jun 9 • 5

LXT

authored 2 papers 5 months ago

CyberV: Cybernetics for Test-time Scaling in Video Understanding

Paper • 2506.07971 • Published Jun 9 • 5

DiffDecompose: Layer-Wise Decomposition of Alpha-Composited Images via Diffusion Transformers

Paper • 2505.21541 • Published May 24 • 7

QingyuShi

authored a paper 5 months ago

Muddit: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model

Paper • 2505.23606 • Published May 29 • 14