RaymondAISG commited on
Commit
844e79e
1 Parent(s): 70a8b9f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -2
README.md CHANGED
@@ -56,8 +56,7 @@ IFEval evaluates a model's ability to adhere to constraints provided in the prom
56
  MT-Bench evaluates a model's ability to engage in multi-turn (2 turns) conversations and respond in ways that align with human needs. We use `gpt-4-1106-preview` as the judge model and compare against `gpt-3.5-turbo-0125` as the baseline model. The metric used is the weighted win rate against the baseline model (i.e. average win rate across each category (Math, Reasoning, STEM, Humanities, Roleplay, Writing, Extraction)). A tie is given a score of 0.5.
57
 
58
 
59
- For more details on Llama3 8B CPT SEA-LIONv2 Instruct benchmark performance, please refer to the [SEA HELM leaderboard](https://leaderboard.sea-lion.ai/),
60
- https://leaderboard.sea-lion.ai/
61
 
62
 
63
  ### Usage
 
56
  MT-Bench evaluates a model's ability to engage in multi-turn (2 turns) conversations and respond in ways that align with human needs. We use `gpt-4-1106-preview` as the judge model and compare against `gpt-3.5-turbo-0125` as the baseline model. The metric used is the weighted win rate against the baseline model (i.e. average win rate across each category (Math, Reasoning, STEM, Humanities, Roleplay, Writing, Extraction)). A tie is given a score of 0.5.
57
 
58
 
59
+ For more details on Llama3 8B CPT SEA-LIONv2 Instruct benchmark performance, please refer to the SEA HELM leaderboard, https://leaderboard.sea-lion.ai/
 
60
 
61
 
62
  ### Usage