Spaces:
Sleeping
Sleeping
排名,大模型,机构,自由问答,内容创作,跨语言翻译,内容总结,多轮对话,指令遵循,逻辑与推理,场景模拟,角色模拟,综合得分 | |
🥇,GPT4-Turbo,OpenAI,94.29,74.50,78.31,75.34,95.71,89.52,80.00,80.64,75.00,82.59 | |
🥈,GPT4,OpenAI,82.90,70.06,79.76,77.55,96.07,84.29,76.25,77.93,80.60,80.60 | |
🥉,文心一言4(ERNIE-Bot4.0),百度,79.64,71.15,77.98,84.44,98.93,84.29,80.00,70.43,73.45,80.03 | |
4,通义千问2(qwen-max),阿里巴巴,76.96,66.03,76.34,74.06,92.50,80.00,71.25,72.43,67.44,75.22 | |
5,GPT3.5-Turbo,OpenAI,81.43,67.77,72.32,61.05,92.14,77.50,48.75,77.50,78.21,72.96 | |
6,讯飞星火v3.0,科大讯飞,80.58,67.49,71.50,76.79,76.43,72.14,63.75,71.00,73.81,72.61 | |
7,商汤日日新(Sensenova),商汤科技,78.35,62.55,74.96,77.21,70.71,71.43,62.50,74.29,69.64,71.29 | |
8,MiniMax(abab5.5-chat),MiniMax,80.36,66.94,59.00,77.30,88.93,71.07,55.00,73.50,68.81,71.21 | |
9,ChatGLM3 -6B,清华&智谱,80.27,59.07,66.00,81.04,96.79,72.62,51.25,61.43,65.00,70.38 | |
10,360智脑(360GPT_S2_V9),360,64.64,57.88,69.87,67.60,98.93,66.96,58.75,60.14,62.74,67.50 | |
11,百川(baichuan2-13b-chat-v1),百川智能,75.49,52.38,72.73,59.44,80.71,62.44,16.25,58.50,63.33,60.14 | |
12,千帆-llama2,Meta/百度千帆,81.74,50.28,60.23,67.18,30.71,58.57,46.25,57.79,60.60,57.04 | |
13,悟道・天鹰(AquilaChat-7B),智源研究院,66.52,52.29,69.16,69.73,70.00,50.77,22.50,54.57,55.24,56.75 | |
14,BLOOMZ-7B,BigScience,59.42,39.38,58.11,69.56,69.29,41.43,20.00,44.50,46.55,49.80 |