hynky HF staff commited on
Commit
8507b02
·
1 Parent(s): ed9481f

hide aside on small screens + move one adie up

Browse files
Files changed (2) hide show
  1. app/src/index.html +1 -1
  2. app/src/style.css +11 -4
app/src/index.html CHANGED
@@ -125,12 +125,12 @@
125
  <div class="task-signal-plot" data-language="Telugu" data-task="tydiqa_tel" data-show-controls="false" data-task-metrics="snr" data-metric="acc_norm_token" data-group-seeds="false" data-title="❌ Bad SNR: tydiqa_tel [te]"></div>
126
  </div>
127
 
 
128
  <h4>Non-Random Performance</h4>
129
  <p>Many model capabilities are acquired later in training, thus <b>many tasks</b> (especially harder ones, such as math-related ones) <b>show baseline-level performance for an extended period</b>. While these tasks are useful, they're not ideal for early pre-training evaluation, and <b>we did not want to keep them</b> for this setting.</p>
130
 
131
  <p>We first computed the baseline random performance of the task (as the sum of 1/n_choices for all samples for multiple choice questions, and as zero for generative evaluations). Then we calculated the task's distance from the baseline as the maximum score across all models minus the baseline.</p>
132
 
133
- <aside>Assuming model performance is normally distributed across different seeds, we want the benchmark-run performance to be at least 3 final-stds above the benchmark random baseline. This would mean that 99.85% of seed scores are above the random baseline (formally, benchmark-run performance - benchmark random baseline > 3 * final-std).</aside>
134
 
135
  <div style="display: flex; grid-column: middle">
136
  <div class="task-signal-plot" data-language="Chinese" data-task="agieval_zho_cf:_average" data-show-controls="false" data-task-metrics="randomness" data-metric="acc_norm_pmi" data-group-seeds="true" data-title="✅ Non-random: agieval_zho_cf/acc_pmi [zh]"></div>
 
125
  <div class="task-signal-plot" data-language="Telugu" data-task="tydiqa_tel" data-show-controls="false" data-task-metrics="snr" data-metric="acc_norm_token" data-group-seeds="false" data-title="❌ Bad SNR: tydiqa_tel [te]"></div>
126
  </div>
127
 
128
+ <aside>Assuming model performance is normally distributed across different seeds, we want the benchmark-run performance to be at least 3 final-stds above the benchmark random baseline. This would mean that 99.85% of seed scores are above the random baseline (formally, benchmark-run performance - benchmark random baseline > 3 * final-std).</aside>
129
  <h4>Non-Random Performance</h4>
130
  <p>Many model capabilities are acquired later in training, thus <b>many tasks</b> (especially harder ones, such as math-related ones) <b>show baseline-level performance for an extended period</b>. While these tasks are useful, they're not ideal for early pre-training evaluation, and <b>we did not want to keep them</b> for this setting.</p>
131
 
132
  <p>We first computed the baseline random performance of the task (as the sum of 1/n_choices for all samples for multiple choice questions, and as zero for generative evaluations). Then we calculated the task's distance from the baseline as the maximum score across all models minus the baseline.</p>
133
 
 
134
 
135
  <div style="display: flex; grid-column: middle">
136
  <div class="task-signal-plot" data-language="Chinese" data-task="agieval_zho_cf:_average" data-show-controls="false" data-task-metrics="randomness" data-metric="acc_norm_pmi" data-group-seeds="true" data-title="✅ Non-random: agieval_zho_cf/acc_pmi [zh]"></div>
app/src/style.css CHANGED
@@ -121,10 +121,17 @@ d-contents nav > div > a {
121
  }
122
 
123
  d-article aside {
124
- height: 0px;
125
- overflow: visible;
126
- margin-bottom: 1em;
127
- z-index: 1000;
 
 
 
 
 
 
 
128
  }
129
 
130
  @media (min-width: 768px) {
 
121
  }
122
 
123
  d-article aside {
124
+ display: none;
125
+ }
126
+
127
+ @media (min-width: 768px) {
128
+ d-article aside {
129
+ display: block;
130
+ height: 0px;
131
+ overflow: visible;
132
+ margin-bottom: 1em;
133
+ z-index: 1000;
134
+ }
135
  }
136
 
137
  @media (min-width: 768px) {