Q-Bench-Leaderboard / qbench_a1_single_test.csv
teowu
initial A1 results
5e9cb18
raw
history blame
1.77 kB
Model (variant),Yes-or-No,What,How,Distortion,Other,In-context Distortion,In-context Other,Overall
InfiMM (Zephyr-7B),61.31,56.61,49.58,47.79,62.05,51.71,67.68,56.05
Emu2-Chat (LLaMA-33B),70.09,65.12,54.11,66.22,62.96,63.47,73.21,64.32
Fuyu-8B (Persimmon-8B),62.22,35.79,36.62,41.07,49.4,45.89,49.04,45.75
BakLLava (Mistral-7B),66.46,61.48,54.83,51.33,63.76,56.52,78.16,61.02
SPHINX,74.45,65.5,62.13,59.11,73.26,66.09,77.56,67.69
mPLUG-Owl2 (LLaMA-7B),72.26,55.53,58.64,52.59,71.36,58.9,73.0,62.68
LLaVA-v1.5 (Vicuna-v1.5-7B),64.6,59.22,55.76,47.98,67.3,58.9,73.76,60.07
LLaVA-v1.5 (Vicuna-v1.5-13B),64.96,64.86,54.12,53.55,66.59,58.9,71.48,61.4
InternLM-XComposer-VL (InternLM),68.43,62.04,61.93,56.81,70.41,57.53,77.19,64.35
IDEFICS-Instruct (LLaMA-7B),60.04,46.42,46.71,40.38,59.9,47.26,64.77,51.51
Qwen-VL (QwenLM),65.33,60.74,58.44,54.13,66.35,58.22,73.0,61.67
Shikra (Vicuna-7B),69.09,47.93,46.71,47.31,60.86,53.08,64.77,55.32
Otter-v1 (MPT-7B),57.66,39.7,42.59,42.12,48.93,47.6,54.17,47.22
InstructBLIP (Flan-T5-XL),69.53,59.0,56.17,57.31,65.63,56.51,71.21,61.94
InstructBLIP (Vicuna-7B),70.99,51.41,43.0,45.0,63.01,57.19,64.39,55.85
VisualGLM-6B (GLM-6B),61.31,53.58,44.03,48.56,54.89,55.48,57.79,53.31
mPLUG-Owl (LLaMA-7B),72.45,54.88,47.53,49.62,63.01,62.67,66.67,58.93
LLaMA-Adapter-V2,66.61,54.66,51.65,56.15,61.81,59.25,54.55,58.06
LLaVA-v1 (Vicuna-13B),57.12,54.88,51.85,45.58,58.0,57.19,64.77,54.72
MiniGPT-4 (Vicuna-13B),60.77,50.33,43.0,45.58,52.51,53.42,60.98,51.77
Qwen-VL-Plus (Close-Source),75.74,73.25,57.33,64.88,73.24,68.67,70.56,68.93
Qwen-VL-Max (Close-Source),73.2,81.02,68.39,70.84,74.57,73.11,80.44,73.9
Gemini-Pro (Close-Source),71.26,71.39,65.59,67.3,73.04,65.88,73.6,69.46
GPT-4V (Close-Source),77.72,78.39,66.45,71.01,71.07,79.36,78.91,74.1