-
MMFineReason: Closing the Multimodal Reasoning Gap via Open Data-Centric Methods
Paper β’ 2601.21821 β’ Published β’ 58 -
OpenDataArena/MMFineReason-1.8M-Qwen3-VL-235B-Thinking
Viewer β’ Updated β’ 1.81M β’ 3.28k β’ 110 -
OpenDataArena/MMFineReason-SFT-586K-Qwen3-VL-235B-Thinking
Viewer β’ Updated β’ 586k β’ 466 β’ 4 -
OpenDataArena/MMFineReason-SFT-123K-Qwen3-VL-235B-Thinking
Viewer β’ Updated β’ 123k β’ 671 β’ 68
AI & ML interests
Data-centric AI, LLM, MLLM
Recent Activity
Papers
View all Papers
Organization Card
π About OpenDataArena
OpenDataArena (ODA) is an open research initiative devoted to evaluating, benchmarking, and creating high-value datasets for the post-training era of large language models (LLMs).
We believe data quality defines model capability β and that open, reproducible evaluation is key to accelerating progress in AI.
π Our Mission
To make data evaluation scientific, transparent, and community-driven, while continuously producing high-value, openly available datasets that enhance model alignment and reasoning ability.
π Key Features
- π Dataset Leaderboard β Leaderboard ranks the most valuable datasets across multiple domains, based on diverse benchmarks.
- π Comprehensive Scoring System β Scoring tool measures dataset quality, diversity, and learning values using reproducible pipelines.
- π§° Open-Source Toolkit β OpenDataArena-Tool enables dataset evaluation, scoring with a standardized, community-driven workflow.
- π± High-Value Data Generation β beyond evaluation, ODA continuously produces and shares new, top-quality datasets for fine-tuning and alignment research.
If you find our work helpful, please consider β starring and subscribing to support open, data-driven AI research. Learn more at opendataarena.github.io.
(OpenDataArena is part of OpenDataLab).
High-quality STEM reasoning dataset for Multimodal LLM post-training.
High-quality mixture datasets for post-training covering multiple domains.
-
OpenDataArena/ODA-Mixture-500k
Viewer β’ Updated β’ 506k β’ 3.07k β’ 122 -
OpenDataArena/ODA-Mixture-100k
Viewer β’ Updated β’ 101k β’ 1.64k β’ 96 -
OpenDataArena/Qwen2.5-7B-ODA-Mixture-500k
Text Generation β’ 333k β’ Updated β’ 10 β’ 2 -
OpenDataArena/Qwen2.5-7B-ODA-Mixture-100k
Text Generation β’ 333k β’ Updated β’ 11
High-quality STEM reasoning dataset for Multimodal LLM post-training.
-
MMFineReason: Closing the Multimodal Reasoning Gap via Open Data-Centric Methods
Paper β’ 2601.21821 β’ Published β’ 58 -
OpenDataArena/MMFineReason-1.8M-Qwen3-VL-235B-Thinking
Viewer β’ Updated β’ 1.81M β’ 3.28k β’ 110 -
OpenDataArena/MMFineReason-SFT-586K-Qwen3-VL-235B-Thinking
Viewer β’ Updated β’ 586k β’ 466 β’ 4 -
OpenDataArena/MMFineReason-SFT-123K-Qwen3-VL-235B-Thinking
Viewer β’ Updated β’ 123k β’ 671 β’ 68
High-quality mixture datasets for post-training covering multiple domains.
-
OpenDataArena/ODA-Mixture-500k
Viewer β’ Updated β’ 506k β’ 3.07k β’ 122 -
OpenDataArena/ODA-Mixture-100k
Viewer β’ Updated β’ 101k β’ 1.64k β’ 96 -
OpenDataArena/Qwen2.5-7B-ODA-Mixture-500k
Text Generation β’ 333k β’ Updated β’ 10 β’ 2 -
OpenDataArena/Qwen2.5-7B-ODA-Mixture-100k
Text Generation β’ 333k β’ Updated β’ 11
models
9
OpenDataArena/MMFineReason-4B
Visual Question Answering
β’
5B
β’
Updated
β’
54
β’
13
OpenDataArena/MMFineReason-2B
Visual Question Answering
β’
2B
β’
Updated
β’
25
β’
8
OpenDataArena/MMFineReason-8B
Visual Question Answering
β’
9B
β’
Updated
β’
19
β’
8
OpenDataArena/Qwen3-8B-ODA-Math-460k
Text Generation
β’
308k
β’
Updated
β’
14
β’
1
OpenDataArena/Qwen2.5-7B-ODA-Math-460k
Text Generation
β’
8B
β’
Updated
β’
15
OpenDataArena/Qwen3-8B-ODA-Mixture-100k
Text Generation
β’
308k
β’
Updated
β’
44
β’
1
OpenDataArena/Qwen3-8B-ODA-Mixture-500k
Text Generation
β’
308k
β’
Updated
β’
20
OpenDataArena/Qwen2.5-7B-ODA-Mixture-100k
Text Generation
β’
333k
β’
Updated
β’
11
OpenDataArena/Qwen2.5-7B-ODA-Mixture-500k
Text Generation
β’
333k
β’
Updated
β’
10
β’
2
datasets
9
OpenDataArena/MMFineReason-SFT-123K-Qwen3-VL-235B-Thinking
Viewer
β’
Updated
β’
123k
β’
671
β’
68
OpenDataArena/MMFineReason-SFT-586K-Qwen3-VL-235B-Thinking
Viewer
β’
Updated
β’
586k
β’
466
β’
4
OpenDataArena/MMFineReason-Full-2.3M-Qwen3-VL-235B-Thinking
Viewer
β’
Updated
β’
2.29M
β’
2.95k
β’
59
OpenDataArena/MMFineReason-1.8M-Qwen3-VL-235B-Thinking
Viewer
β’
Updated
β’
1.81M
β’
3.28k
β’
110
OpenDataArena/ODA-Math-460k
Viewer
β’
Updated
β’
460k
β’
3.81k
β’
103
OpenDataArena/ODA-Mixture-100k
Viewer
β’
Updated
β’
101k
β’
1.64k
β’
96
OpenDataArena/ODA-Mixture-500k
Viewer
β’
Updated
β’
506k
β’
3.07k
β’
122
OpenDataArena/OpenDataArena-scored-data
Viewer
β’
Updated
β’
15.7M
β’
11.2k
β’
10
OpenDataArena/MathLake
Viewer
β’
Updated
β’
8.31M
β’
1.22k
β’
21