File size: 2,506 Bytes
68197d2
cd9247c
46397bf
9895d6c
ae8f0b4
 
 
9399cb8
68197d2
 
 
 
4d7cf5a
34a3afd
4d7cf5a
68197d2
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
Model,Language Model,Open Source,Text Recognition,Scene Text-Centric VQA,Doc-Oriented VQA,KIE,HMER,Final Score,Link
Minicpm-V 2.6,Qwen2-7B,Yes, 261,186,176,181,48,852,https://github.com/OpenBMB/MiniCPM-V
MiniMonkey,internlm2-chat-1_8b,Yes, 251,174,141,169,71,806,https://arxiv.org/abs/2408.02034
H2OVL-Mississippi-2B,H2O-Danube2-1.8B,Yes,252,171,140,166,53,782,https://huggingface.co/h2oai/h2ovl-mississippi-2b
InternVL2-1B,Qwen2-0.5B-Instruct,Yes, 255,166,130,156,72,779,https://huggingface.co/OpenGVLab/InternVL2-1B
InternVL2-4B,Phi-3-mini-128k-instruct,Yes, 235,170,138,164,69,776,https://huggingface.co/OpenGVLab/InternVL2-4B
InternVL2-2B,internlm2-chat-1_8b,Yes, 245,172,122,167,62,768,https://huggingface.co/OpenGVLab/InternVL2-2B
H2OVL-Mississippi-0.8B,H2O-Danube3-0.5B,Yes, 274,162,112,152,51,751,https://huggingface.co/h2oai/h2ovl-mississippi-800m
Qwen-VL-Max,-,No,254,166,148,143,12,723,https://github.com/QwenLM/Qwen-VL
Qwen-VL-Plus,-,No,248,155,141,141,9,694,https://github.com/QwenLM/Qwen-VL
Gemini,-,No,215,174,128,134,8,659,https://deepmind.google/technologies/gemini/
GPT4V,-,No,167,163,146,160,9,645,https://openai.com/
MiniCPM-V-2,MiniCPM-2.4B, Yes,245,171,103,86,0,605,https://github.com/OpenBMB/MiniCPM-V
mPLUG-DocOwl1.5,LLaMA-2 7B, Yes,182,157,126,134,0,599,https://arxiv.org/abs/2403.12895
TextMonkey,Qwen-7B,Yes,169,164,115,113,0,561,https://export.arxiv.org/abs/2403.04473
InternVL-Chat-Chinese,LLaMA2-13B,Yes,228,153,72,64,0,517,https://arxiv.org/abs/2312.14238
Monkey,Qwen-7B,Yes,174,161,91,88,0,514,https://arxiv.org/abs/2311.06607
InternLM-XComposer2,InternLM2-7B,Yes,160,160,103,87,1,511,https://arxiv.org/abs/2401.16420
QwenVL,Qwen-7B,Yes,179,157,95,75,0,506,https://arxiv.org/abs/2308.12966
mPLUG-Owl2,LLaMA2-7B,Yes,153,153,41,19,0,366,https://arxiv.org/abs/2311.04257
LLaVAR,LLaMA-13B.,Yes,186,122,25,13,0,346,https://arxiv.org/abs/2306.17107
LLaVA1.5-13B,Vicuna-v1.5-13B,Yes,176,129,19,7,0,331,https://arxiv.org/abs/2310.03744
InternLM-XComposer,InternLM-7B,Yes,192,91,14,6,0,303,https://arxiv.org/abs/2309.15112
LLaVA1.5-7B,Vicuna-v1.5-7B,Yes,160,117,15,5,0,297,https://arxiv.org/abs/2310.03744
mPLUG-Owl,LLaMA-2 7B,Yes,172,104,18,3,0,297,https://arxiv.org/abs/2304.14178
BLIVA,Vicuna-7B,Yes,165,103,22,1,0,291,https://arxiv.org/abs/2308.09936
InstructBLIP,Vicuna-7b,Yes,168,93,14,1,0,276,https://arxiv.org/abs/2305.06500
BLIP2-6.7B,OPT-6.7B,Yes,154,71,10,0,0,235,https://arxiv.org/abs/2301.12597
MiniGPT4V2,LLaMA2-13B,Yes,124,29,4,0,0,157,https://arxiv.org/abs/2310.09478