Add alpaca eval

Files changed (6) hide show

README.md CHANGED Viewed

@@ -14,6 +14,27 @@ https://huggingface.co/rishiraj/meow
 who rank #1 and #2 among models <13B in the https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard by 2023/12/20.
 # Code

 who rank #1 and #2 among models <13B in the https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard by 2023/12/20.
+# Alpaca Eval
+I am thrilled to announce that ChatGPT has ranked LMCocktail 10.7B as the second best model next to GPT4 on AlpcaEval in my local community run. You can also check the leaderboard at [./alpaca_eval/chatgpt_fn_--SOLAR-10-7B-LMCocktail/](./alpaca_eval/chatgpt_fn_--SOLAR-10-7B-LMCocktail/)
+```
+                        win_rate  standard_error  n_total  avg_length
+gpt4                       73.79            1.54      805        1365
+SOLAR-10.7B-LMCocktail(new)73.45            1.56      804        1203
+claude                     70.37            1.60      805        1082
+chatgpt                    66.09            1.66      805         811
+wizardlm-13b               65.16            1.67      805         985
+vicuna-13b                 64.10            1.69      805        1037
+guanaco-65b                62.36            1.71      805        1249
+oasst-rlhf-llama-33b       62.05            1.71      805        1079
+alpaca-farm-ppo-human      60.25            1.72      805         803
+falcon-40b-instruct        56.52            1.74      805         662
+text_davinci_003           50.00            0.00      805         307
+alpaca-7b                  45.22            1.74      805         396
+text_davinci_001           28.07            1.56      805         296
+```
 # Code

alpaca_eval/chatgpt_fn_--SOLAR-10-7B-LMCocktail/alpaca_eval_log.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

alpaca_eval/chatgpt_fn_--SOLAR-10-7B-LMCocktail/annotation_chatgpt_fn.json ADDED Viewed

The diff for this file is too large to render. See raw diff

alpaca_eval/chatgpt_fn_--SOLAR-10-7B-LMCocktail/leaderboard.csv ADDED Viewed

+,win_rate,standard_error,n_wins,n_wins_base,n_draws,n_total,mode,avg_length
+gpt4,73.7888198757764,1.5359801545073597,588,205,12,805,minimal,1365
+SOLAR-10.7B-LMCocktail,73.44527363184079,1.5572150363643398,590,213,1,804,community,1203
+claude,70.37267080745342,1.599519507147828,562,234,9,805,minimal,1082
+chatgpt,66.08695652173913,1.6626479994330317,529,270,6,805,minimal,811
+wizardlm-13b,65.15527950310559,1.670034107787565,520,276,9,805,minimal,985
+vicuna-13b,64.09937888198758,1.6895185863153146,515,288,2,805,minimal,1037
+guanaco-65b,62.36024844720497,1.7086348811605765,502,303,0,805,minimal,1249
+oasst-rlhf-llama-33b,62.0496894409938,1.7080028976103514,498,304,3,805,minimal,1079
+alpaca-farm-ppo-human,60.24844720496895,1.7169496733548772,481,316,8,805,minimal,803
+falcon-40b-instruct,56.52173913043478,1.7438750520312944,453,348,4,805,minimal,662
+phi-2-alpaca-gpt4-dpo,55.59701492537313,1.7533719245384989,447,357,0,804,community,4532
+text_davinci_003,50.0,0.0,0,0,805,805,minimal,307
+alpaca-7b,45.21739130434783,1.7375846781579476,356,433,16,805,minimal,396
+text_davinci_001,28.07453416149068,1.5602183426587484,216,569,20,805,minimal,296

alpaca_eval/chatgpt_fn_--SOLAR-10-7B-LMCocktail/model_outputs.json ADDED Viewed

The diff for this file is too large to render. See raw diff

alpaca_eval/chatgpt_fn_--SOLAR-10-7B-LMCocktail/reference_outputs.json ADDED Viewed

The diff for this file is too large to render. See raw diff