hide aside on small screens + move one adie up
Browse files- app/src/index.html +1 -1
- app/src/style.css +11 -4
app/src/index.html
CHANGED
@@ -125,12 +125,12 @@
|
|
125 |
<div class="task-signal-plot" data-language="Telugu" data-task="tydiqa_tel" data-show-controls="false" data-task-metrics="snr" data-metric="acc_norm_token" data-group-seeds="false" data-title="❌ Bad SNR: tydiqa_tel [te]"></div>
|
126 |
</div>
|
127 |
|
|
|
128 |
<h4>Non-Random Performance</h4>
|
129 |
<p>Many model capabilities are acquired later in training, thus <b>many tasks</b> (especially harder ones, such as math-related ones) <b>show baseline-level performance for an extended period</b>. While these tasks are useful, they're not ideal for early pre-training evaluation, and <b>we did not want to keep them</b> for this setting.</p>
|
130 |
|
131 |
<p>We first computed the baseline random performance of the task (as the sum of 1/n_choices for all samples for multiple choice questions, and as zero for generative evaluations). Then we calculated the task's distance from the baseline as the maximum score across all models minus the baseline.</p>
|
132 |
|
133 |
-
<aside>Assuming model performance is normally distributed across different seeds, we want the benchmark-run performance to be at least 3 final-stds above the benchmark random baseline. This would mean that 99.85% of seed scores are above the random baseline (formally, benchmark-run performance - benchmark random baseline > 3 * final-std).</aside>
|
134 |
|
135 |
<div style="display: flex; grid-column: middle">
|
136 |
<div class="task-signal-plot" data-language="Chinese" data-task="agieval_zho_cf:_average" data-show-controls="false" data-task-metrics="randomness" data-metric="acc_norm_pmi" data-group-seeds="true" data-title="✅ Non-random: agieval_zho_cf/acc_pmi [zh]"></div>
|
|
|
125 |
<div class="task-signal-plot" data-language="Telugu" data-task="tydiqa_tel" data-show-controls="false" data-task-metrics="snr" data-metric="acc_norm_token" data-group-seeds="false" data-title="❌ Bad SNR: tydiqa_tel [te]"></div>
|
126 |
</div>
|
127 |
|
128 |
+
<aside>Assuming model performance is normally distributed across different seeds, we want the benchmark-run performance to be at least 3 final-stds above the benchmark random baseline. This would mean that 99.85% of seed scores are above the random baseline (formally, benchmark-run performance - benchmark random baseline > 3 * final-std).</aside>
|
129 |
<h4>Non-Random Performance</h4>
|
130 |
<p>Many model capabilities are acquired later in training, thus <b>many tasks</b> (especially harder ones, such as math-related ones) <b>show baseline-level performance for an extended period</b>. While these tasks are useful, they're not ideal for early pre-training evaluation, and <b>we did not want to keep them</b> for this setting.</p>
|
131 |
|
132 |
<p>We first computed the baseline random performance of the task (as the sum of 1/n_choices for all samples for multiple choice questions, and as zero for generative evaluations). Then we calculated the task's distance from the baseline as the maximum score across all models minus the baseline.</p>
|
133 |
|
|
|
134 |
|
135 |
<div style="display: flex; grid-column: middle">
|
136 |
<div class="task-signal-plot" data-language="Chinese" data-task="agieval_zho_cf:_average" data-show-controls="false" data-task-metrics="randomness" data-metric="acc_norm_pmi" data-group-seeds="true" data-title="✅ Non-random: agieval_zho_cf/acc_pmi [zh]"></div>
|
app/src/style.css
CHANGED
@@ -121,10 +121,17 @@ d-contents nav > div > a {
|
|
121 |
}
|
122 |
|
123 |
d-article aside {
|
124 |
-
|
125 |
-
|
126 |
-
|
127 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
128 |
}
|
129 |
|
130 |
@media (min-width: 768px) {
|
|
|
121 |
}
|
122 |
|
123 |
d-article aside {
|
124 |
+
display: none;
|
125 |
+
}
|
126 |
+
|
127 |
+
@media (min-width: 768px) {
|
128 |
+
d-article aside {
|
129 |
+
display: block;
|
130 |
+
height: 0px;
|
131 |
+
overflow: visible;
|
132 |
+
margin-bottom: 1em;
|
133 |
+
z-index: 1000;
|
134 |
+
}
|
135 |
}
|
136 |
|
137 |
@media (min-width: 768px) {
|