LLM_leaderboard / basic-leaderboard.csv
Li
Update basic-leaderboard.csv
17936c0 verified
raw
history blame
1.49 kB
排名,大模型,机构,自由问答,内容创作,跨语言翻译,内容总结,多轮对话,指令遵循,逻辑与推理,场景模拟,角色模拟,综合得分
🥇,GPT4-Turbo,OpenAI,94.29,74.50,78.31,75.34,95.71,89.52,80.00,80.64,75.00,82.59
🥈,GPT4,OpenAI,82.90,70.06,79.76,77.55,96.07,84.29,76.25,77.93,80.60,80.60
🥉,文心一言4(ERNIE-Bot4.0),百度,79.64,71.15,77.98,84.44,98.93,84.29,80.00,70.43,73.45,80.03
4,通义千问2(qwen-max),阿里巴巴,76.96,66.03,76.34,74.06,92.50,80.00,71.25,72.43,67.44,75.22
5,GPT3.5-Turbo,OpenAI,81.43,67.77,72.32,61.05,92.14,77.50,48.75,77.50,78.21,72.96
6,讯飞星火v3.0,科大讯飞,80.58,67.49,71.50,76.79,76.43,72.14,63.75,71.00,73.81,72.61
7,商汤日日新(Sensenova),商汤科技,78.35,62.55,74.96,77.21,70.71,71.43,62.50,74.29,69.64,71.29
8,MiniMax(abab5.5-chat),MiniMax,80.36,66.94,59.00,77.30,88.93,71.07,55.00,73.50,68.81,71.21
9,ChatGLM3 -6B,清华&智谱,80.27,59.07,66.00,81.04,96.79,72.62,51.25,61.43,65.00,70.38
10,360智脑(360GPT_S2_V9),360,64.64,57.88,69.87,67.60,98.93,66.96,58.75,60.14,62.74,67.50
11,百川(baichuan2-13b-chat-v1),百川智能,75.49,52.38,72.73,59.44,80.71,62.44,16.25,58.50,63.33,60.14
12,千帆-llama2,Meta/百度千帆,81.74,50.28,60.23,67.18,30.71,58.57,46.25,57.79,60.60,57.04
13,悟道・天鹰(AquilaChat-7B),智源研究院,66.52,52.29,69.16,69.73,70.00,50.77,22.50,54.57,55.24,56.75
14,BLOOMZ-7B,BigScience,59.42,39.38,58.11,69.56,69.29,41.43,20.00,44.50,46.55,49.80