zwgao commited on
Commit
1e5408f
1 Parent(s): 3094781

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +64 -39
README.md CHANGED
@@ -53,49 +53,74 @@ InternVL 2.5 is a multimodal large language model series, featuring models of va
53
 
54
  ### Image Benchmarks
55
 
56
- | Benchmark | LLaVA-OneVision-0.5B | InternVL2-1B | InternVL2.5-1B | Qwen2-VL-2B | Aquila-VL-2B | InternVL2-2B | InternVL2.5-2B |
57
- |---------------------|----------------------|--------------|----------------|-------------|--------------|--------------|----------------|
58
- | MMMU (val) | 31.4 | 36.7 | 40.9 | 41.1 | 47.4 | 36.3 | 43.6 |
59
- | MMMU (test) | - | 32.8 | 35.8 | - | - | 34.7 | 38.2 |
60
- | MMMU-PRO (overall) | - | 14.8 | 19.4 | 21.2 | 26.2 | - | 23.7 |
61
- | MathVista (mini) | 34.8 | 37.7 | 43.2 | 43.0 | 59.0 | 46.3 | 51.3 |
62
- | MathVision (mini) | - | 12.2 | 16.8 | 19.7 | 21.1 | 15.8 | 13.5 |
63
- | MathVision (full) | - | 11.1 | 14.4 | 12.4 | 18.4 | 12.1 | 14.7 |
64
- | MathVerse (mini) | 17.9 | 18.4 | 28.0 | 21.0 | 26.2 | 25.3 | 30.6 |
65
- | Olympiad Bench | - | 0.3 | 1.7 | - | - | 0.4 | 2.0 |
66
- | AI2D (w / wo M) | 57.1 / - | 64.1 / 70.5 | 69.3 / 77.8 | 74.7 / 84.6 | 75.0 / - | 74.1 / 82.3 | 74.9 / 83.5 |
67
- | ChartQA (test avg.) | 61.4 | 72.9 | 75.9 | 73.5 | 76.5 | 76.2 | 79.2 |
68
- | TextVQA (val) | - | 70.5 | 72.0 | 79.7 | 76.4 | 73.4 | 74.3 |
69
- | DocVQA (test) | 70.0 | 81.7 | 84.8 | 90.1 | 85.0 | 86.9 | 88.7 |
70
- | InfoVQA (test) | 41.8 | 50.9 | 56.0 | 65.5 | 58.3 | 58.9 | 60.9 |
71
- | OCR-Bench | 565 | 754 | 785 | 809 | 772 | 784 | 804 |
72
- | SEED-2 Plus | - | 54.3 | 59.0 | 62.4 | 63.0 | 60.0 | 60.9 |
73
- | CharXiv (RQ / DQ) | - | 18.1 / 30.7 | 19.0 / 38.4 | - | - | 21.0 / 40.6 | 21.3 / 49.7 |
74
- | VCR-EN-Easy (EM / Jaccard) | - | 21.5 / 48.4 | 91.5 / 97.0 | 81.5 / - | 70.0 / - | 32.9 / 59.2 | 93.2 / 97.6 |
75
- | BLINK (val) | 52.1 | 38.6 | 42.0 | 44.4 | | 43.8 | 44.0 |
76
- | Mantis Eval | 39.6 | 46.1 | 51.2 | - | - |48.4 | 54.8 |
77
- | MMIU | - | 37.3 | 38.5 | - | - |39.8 | 43.5 |
78
- | Muir Bench | 25.5 | 29.3 | 29.9 | - | - |32.5 | 40.6 |
79
- | MMT (val) | - | 49.5 | 50.3 | 55.1 | - | 50.4 | 54.5 |
80
- | MIRB (avg.) | - | 31.5 | 35.6 | - | - | 32.1 | 36.4 |
81
- | RealWorld QA | 55.6 | 50.3 | 57.5 | 62.6 | - |57.3 | 60.1 |
82
- | MME-RW (EN) | - | 40.2 | 44.2 | - | - |47.3 | 48.8 |
83
- | WildVision (win rate)| - | 17.8 | 43.4 | - | - |31.8 | 44.2 |
84
- | R-Bench | - | 55.6 | 59.0 | - |- | 56.8 | 62.2 |
85
- | MME (sum) | 1438.0 | 1794.4 | 1950.5 | 1872.0 | - | 1876.8 | 2138.2 |
86
- | MMB (EN / CN) | 61.6 / 55.5 | 65.4 / 60.7 | 70.7 / 66.3 | 74.9 / 73.5 | - |73.2 / 70.9 | 74.7 / 71.9 |
87
- | MMBv1.1 (EN) | 59.6 | 61.6 | 68.4 | 72.2 | - |70.2 | 72.2 |
88
- | MMVet (turbo) | 32.2 | 32.7 | 48.8 | 49.5 | - |39.5 | 60.8 |
89
- | MMVetv2 (0613) | - | 36.1 | 43.2 | - | - |39.6 | 52.3 |
90
- | MMStar | 37.7 | 45.7 | 50.1 | 48.0 | - |50.1 | 53.7 |
91
- | HallBench (avg.) | 27.9 | 34.0 | 39.0 | 41.7 | - |37.9 | 42.6 |
92
- | MMHal (score) | - | 2.25 | 2.49 | - | - |2.52 | 2.94 |
93
- | CRPE (relation) | - | 57.5 | 60.9 | - | - |66.3 | 70.2 |
94
- | POPE (avg.) | - | 87.3 | 89.9 | - | - |88.3 | 90.6 |
95
 
96
 
97
  ### Video Benchmarks
98
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
99
  ### Multimodal Multilingual Understanding
100
 
101
  <table>
 
53
 
54
  ### Image Benchmarks
55
 
56
+ | Benchmark | LLaVA-OneVision-0.5B | InternVL2.5-1B | Qwen2-VL-2B | Aquila-VL-2B | InternVL2.5-2B |
57
+ |----------------------------|----------------------|----------------|-------------|--------------|----------------|
58
+ | MMMU (val) | 31.4 | 40.9 | 41.1 | 47.4 | 43.6 |
59
+ | MMMU (test) | - | 35.8 | - | - | 38.2 |
60
+ | MMMU-PRO (overall) | - | 19.4 | 21.2 | 26.2 | 23.7 |
61
+ | MathVista (mini) | 34.8 | 43.2 | 43.0 | 59.0 | 51.3 |
62
+ | MathVision (mini) | - | 16.8 | 19.7 | 21.1 | 13.5 |
63
+ | MathVision (full) | - | 14.4 | 12.4 | 18.4 | 14.7 |
64
+ | MathVerse (mini) | 17.9 | 28.0 | 21.0 | 26.2 | 30.6 |
65
+ | Olympiad Bench | - | 1.7 | - | - | 2.0 |
66
+ | AI2D (w / wo M) | 57.1 / - | 69.3 / 77.8 | 74.7 / 84.6 | 75.0 / - | 74.9 / 83.5 |
67
+ | ChartQA (test avg.) | 61.4 | 75.9 | 73.5 | 76.5 | 79.2 |
68
+ | TextVQA (val) | - | 72.0 | 79.7 | 76.4 | 74.3 |
69
+ | DocVQA (test) | 70.0 | 84.8 | 90.1 | 85.0 | 88.7 |
70
+ | InfoVQA (test) | 41.8 | 56.0 | 65.5 | 58.3 | 60.9 |
71
+ | OCR-Bench | 565 | 785 | 809 | 772 | 804 |
72
+ | SEED-2 Plus | - | 59.0 | 62.4 | 63.0 | 60.9 |
73
+ | CharXiv (RQ / DQ) | - | 19.0 / 38.4 | - | - | 21.3 / 49.7 |
74
+ | VCR-EN-Easy (EM / Jaccard) | - | 91.5 / 97.0 | 81.5 / - | 70.0 / - | 93.2 / 97.6 |
75
+ | BLINK (val) | 52.1 | 42.0 | 44.4 | | 44.0 |
76
+ | Mantis Eval | 39.6 | 51.2 | - | - | 54.8 |
77
+ | MMIU | - | 38.5 | - | - | 43.5 |
78
+ | Muir Bench | 25.5 | 29.9 | - | - | 40.6 |
79
+ | MMT (val) | - | 50.3 | 55.1 | - | 54.5 |
80
+ | MIRB (avg.) | - | 35.6 | - | - | 36.4 |
81
+ | RealWorld QA | 55.6 | 57.5 | 62.6 | - | 60.1 |
82
+ | MME-RW (EN) | - | 44.2 | - | - | 48.8 |
83
+ | WildVision (win rate) | - | 43.4 | - | - | 44.2 |
84
+ | R-Bench | - | 59.0 | - | - | 62.2 |
85
+ | MME (sum) | 1438.0 | 1950.5 | 1872.0 | - | 2138.2 |
86
+ | MMB (EN / CN) | 61.6 / 55.5 | 70.7 / 66.3 | 74.9 / 73.5 | - | 74.7 / 71.9 |
87
+ | MMBv1.1 (EN) | 59.6 | 68.4 | 72.2 | - | 72.2 |
88
+ | MMVet (turbo) | 32.2 | 48.8 | 49.5 | - | 60.8 |
89
+ | MMVetv2 (0613) | - | 43.2 | - | - | 52.3 |
90
+ | MMStar | 37.7 | 50.1 | 48.0 | - | 53.7 |
91
+ | HallBench (avg.) | 27.9 | 39.0 | 41.7 | - | 42.6 |
92
+ | MMHal (score) | - | 2.49 | - | - | 2.94 |
93
+ | CRPE (relation) | - | 60.9 | - | - | 70.2 |
94
+ | POPE (avg.) | - | 89.9 | - | - | 90.6 |
95
 
96
 
97
  ### Video Benchmarks
98
 
99
+ | Model Name | Video-MME (wo / w sub) | MVBench | MMBench-Video (val) | MLVU (M-Avg) | LongVideoBench (val total) | CG-Bench v1.1 (long / clue acc.) |
100
+ |---------------------------------------------|-------------|------|-------|-------|------|-------------|
101
+ | **InternVL2.5-1B** | 50.3 / 52.3 | 64.3 | 1.36 | 57.3 | 47.9 | - |
102
+ | Qwen2-VL-2B | 55.6 / 60.4 | 63.2 | - | - | - | - |
103
+ | **InternVL2.5-2B** | 51.9 / 54.1 | 68.8 | 1.44 | 61.4 | 52.0 | - |
104
+ | **InternVL2.5-4B** | 62.3 / 63.6 | 71.6 | 1.73 | 68.3 | 55.2 | - |
105
+ | VideoChat2-HD | 45.3 / 55.7 | 62.3 | 1.22 | 47.9 | - | - |
106
+ | MiniCPM-V-2.6 | 60.9 / 63.6 | - | 1.70 | - | 54.9 | - |
107
+ | LLaVA-OneVision-7B | 58.2 / - | 56.7 | - | - | - | - |
108
+ | Qwen2-VL-7B | 63.3 / 69.0 | 67.0 | 1.44 | - | 55.6 | - |
109
+ | **InternVL2.5-8B** | 64.2 / 66.9 | 72.0 | 1.68 | 68.9 | 60.0 | - |
110
+ | **InternVL2.5-26B** | 66.9 / 69.2 | 75.2 | 1.86 | 72.3 | 59.9 | - |
111
+ | Oryx-1.5-32B | 67.3 / 74.9 | 70.1 | 1.52 | 72.3 | - | - |
112
+ | VILA-1.5-40B | 60.1 / 61.1 | - | 1.61 | 56.7 | - | - |
113
+ | **InternVL2.5-38B** | 70.7 / 73.1 | 74.4 | 1.82 | 75.3 | 63.3 | - |
114
+ | GPT-4V/4T | 59.9 / 63.3 | 43.7 | 1.53 | 49.2 | 59.1 | - |
115
+ | GPT-4o-20240513 | 71.9 / 77.2 | - | 1.63 | 64.6 | 66.7 | - |
116
+ | GPT-4o-20240806 | - | - | 1.87 | - | - | - |
117
+ | Gemini-1.5-Pro | 75.0 / 81.3 | - | 1.30 | - | 64.0 | - |
118
+ | VideoLLaMA2-72B | 61.4 / 63.1 | 62.0 | - | - | - | - |
119
+ | LLaVA-OneVision-72B | 66.2 / 69.5 | 59.4 | - | 66.4 | 61.3 | - |
120
+ | Qwen2-VL-72B | 71.2 / 77.8 | 73.6 | 1.70 | - | - | 41.3 / 56.2 |
121
+ | InternVL2-Llama3-76B | 64.7 / 67.8 | 69.6 | 1.71 | 69.9 | 61.1 | - |
122
+ | **InternVL2.5-78B** | 72.1 / 74.0 | 76.4 | 1.97 | 75.7 | 63.6 | 42.2 / 58.5 |
123
+
124
  ### Multimodal Multilingual Understanding
125
 
126
  <table>