arvindabacus
commited on
Commit
•
fbaa713
1
Parent(s):
b7b3720
Update README.md
Browse files
README.md
CHANGED
@@ -82,7 +82,7 @@ print(outputs[0]["generated_text"][len(prompt):])
|
|
82 |
|
83 |
### Arena-Hard
|
84 |
|
85 |
-
Score vs selected others (sourced from: (https://lmsys.org/blog/2024-04-19-arena-hard/#full-leaderboard-with-gpt-4-turbo-as-judge))
|
86 |
|
87 |
| Model | Score | 95% Confidence Interval | Average Tokens |
|
88 |
| :---- | ---------: | ----------: | ------: |
|
|
|
82 |
|
83 |
### Arena-Hard
|
84 |
|
85 |
+
Score vs selected others (sourced from: (https://lmsys.org/blog/2024-04-19-arena-hard/#full-leaderboard-with-gpt-4-turbo-as-judge)). GPT-4o and Gemini-1.5-pro-latest were missing from the original blob post, and we produced those numbers from a local run using the same methodology.
|
86 |
|
87 |
| Model | Score | 95% Confidence Interval | Average Tokens |
|
88 |
| :---- | ---------: | ----------: | ------: |
|