yhyu13 commited on
Commit
79ec3a4
1 Parent(s): 33bd671

Add alpaca eval

Browse files
README.md CHANGED
@@ -14,6 +14,27 @@ https://huggingface.co/rishiraj/meow
14
 
15
  who rank #1 and #2 among models <13B in the https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard by 2023/12/20.
16
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
 
18
  # Code
19
 
 
14
 
15
  who rank #1 and #2 among models <13B in the https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard by 2023/12/20.
16
 
17
+ # Alpaca Eval
18
+
19
+ I am thrilled to announce that ChatGPT has ranked LMCocktail 10.7B as the second best model next to GPT4 on AlpcaEval in my local community run. You can also check the leaderboard at [./alpaca_eval/chatgpt_fn_--SOLAR-10-7B-LMCocktail/](./alpaca_eval/chatgpt_fn_--SOLAR-10-7B-LMCocktail/)
20
+
21
+ ```
22
+ win_rate standard_error n_total avg_length
23
+ gpt4 73.79 1.54 805 1365
24
+ SOLAR-10.7B-LMCocktail(new)73.45 1.56 804 1203
25
+ claude 70.37 1.60 805 1082
26
+ chatgpt 66.09 1.66 805 811
27
+ wizardlm-13b 65.16 1.67 805 985
28
+ vicuna-13b 64.10 1.69 805 1037
29
+ guanaco-65b 62.36 1.71 805 1249
30
+ oasst-rlhf-llama-33b 62.05 1.71 805 1079
31
+ alpaca-farm-ppo-human 60.25 1.72 805 803
32
+ falcon-40b-instruct 56.52 1.74 805 662
33
+ text_davinci_003 50.00 0.00 805 307
34
+ alpaca-7b 45.22 1.74 805 396
35
+ text_davinci_001 28.07 1.56 805 296
36
+ ```
37
+
38
 
39
  # Code
40
 
alpaca_eval/chatgpt_fn_--SOLAR-10-7B-LMCocktail/alpaca_eval_log.txt ADDED
The diff for this file is too large to render. See raw diff
 
alpaca_eval/chatgpt_fn_--SOLAR-10-7B-LMCocktail/annotation_chatgpt_fn.json ADDED
The diff for this file is too large to render. See raw diff
 
alpaca_eval/chatgpt_fn_--SOLAR-10-7B-LMCocktail/leaderboard.csv ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ,win_rate,standard_error,n_wins,n_wins_base,n_draws,n_total,mode,avg_length
2
+ gpt4,73.7888198757764,1.5359801545073597,588,205,12,805,minimal,1365
3
+ SOLAR-10.7B-LMCocktail,73.44527363184079,1.5572150363643398,590,213,1,804,community,1203
4
+ claude,70.37267080745342,1.599519507147828,562,234,9,805,minimal,1082
5
+ chatgpt,66.08695652173913,1.6626479994330317,529,270,6,805,minimal,811
6
+ wizardlm-13b,65.15527950310559,1.670034107787565,520,276,9,805,minimal,985
7
+ vicuna-13b,64.09937888198758,1.6895185863153146,515,288,2,805,minimal,1037
8
+ guanaco-65b,62.36024844720497,1.7086348811605765,502,303,0,805,minimal,1249
9
+ oasst-rlhf-llama-33b,62.0496894409938,1.7080028976103514,498,304,3,805,minimal,1079
10
+ alpaca-farm-ppo-human,60.24844720496895,1.7169496733548772,481,316,8,805,minimal,803
11
+ falcon-40b-instruct,56.52173913043478,1.7438750520312944,453,348,4,805,minimal,662
12
+ phi-2-alpaca-gpt4-dpo,55.59701492537313,1.7533719245384989,447,357,0,804,community,4532
13
+ text_davinci_003,50.0,0.0,0,0,805,805,minimal,307
14
+ alpaca-7b,45.21739130434783,1.7375846781579476,356,433,16,805,minimal,396
15
+ text_davinci_001,28.07453416149068,1.5602183426587484,216,569,20,805,minimal,296
alpaca_eval/chatgpt_fn_--SOLAR-10-7B-LMCocktail/model_outputs.json ADDED
The diff for this file is too large to render. See raw diff
 
alpaca_eval/chatgpt_fn_--SOLAR-10-7B-LMCocktail/reference_outputs.json ADDED
The diff for this file is too large to render. See raw diff