mfajcik commited on
Commit
715ec0a
โ€ข
1 Parent(s): c980420

Update content.py

Browse files
Files changed (1) hide show
  1. content.py +1 -1
content.py CHANGED
@@ -95,7 +95,7 @@ We use the following tests, with varying statistical power:
95
 
96
  ### Duel Scoring Mechanism, Win Score
97
  On each task, each model is scored to each model (up to top-50 currently submitted models). For each model, record proportion of won duels: **Win Score**(WS).
98
- Next, the *Category Win Score**(CWS), is computed as an average over model's WSs in that category. Similarly, ๐Ÿ‡จ๐Ÿ‡ฟ **BenCzechMark Win Score** is computed as model's average CWS across categories.
99
  The properties of this ranking mechanism include:
100
  - Ranking can change after every submission.
101
  - The across-task aggregation is interpretable: in words, it measures the average proportion of times the model is better.
 
95
 
96
  ### Duel Scoring Mechanism, Win Score
97
  On each task, each model is scored to each model (up to top-50 currently submitted models). For each model, record proportion of won duels: **Win Score**(WS).
98
+ Next, the **Category Win Score**(CWS), is computed as an average over model's WSs in that category. Similarly, ๐Ÿ‡จ๐Ÿ‡ฟ **BenCzechMark Win Score** is computed as model's average CWS across categories.
99
  The properties of this ranking mechanism include:
100
  - Ranking can change after every submission.
101
  - The across-task aggregation is interpretable: in words, it measures the average proportion of times the model is better.