Kill trailing
Browse files- app/src/index.html +1 -1
app/src/index.html
CHANGED
@@ -233,7 +233,7 @@ def get_canonical_form(expression: str) -> str:
|
|
233 |
<li><strong>Oracle: </strong>use the ground truth labels to estimate the pass@1 score per problem. Bin the distribution of pass@1 scores to determine the quintiles.</li>
|
234 |
<li><strong>Model: </strong>use the distribution of average PRM scores per problem to determine the quintiles. The intuition here is that harder problems will have lower scores.</li>
|
235 |
</ul>
|
236 |
-
|
237 |
<p id="15d1384e-bcac-80a3-af7c-f3497126ab1e" class="">Here’s the breakdown of the various methods according to the pass@1 scores and across four test-time compute budgets of \(N = [4,16,64, 256]\):</p><figure id="15b1384e-bcac-80ad-9cf3-cf5bcbd3f53b" class="image"><a href="https://huggingface.co/datasets/HuggingFaceH4/blogpost-images/resolve/main/levels-maj-bon-beam.png"><img style="width:707.9891357421875px" src="https://huggingface.co/datasets/HuggingFaceH4/blogpost-images/resolve/main/levels-maj-bon-beam.png"/></a></figure><p id="15d1384e-bcac-80c3-93b3-fa4c071ac807" class="">In this plot, each bar denotes a test-time compute budget, and within each bar we show the relative accuracy of each method. For example, in the group of four bars on difficulty level 2 we see that:</p>
|
238 |
|
239 |
<ul>
|
|
|
233 |
<li><strong>Oracle: </strong>use the ground truth labels to estimate the pass@1 score per problem. Bin the distribution of pass@1 scores to determine the quintiles.</li>
|
234 |
<li><strong>Model: </strong>use the distribution of average PRM scores per problem to determine the quintiles. The intuition here is that harder problems will have lower scores.</li>
|
235 |
</ul>
|
236 |
+
|
237 |
<p id="15d1384e-bcac-80a3-af7c-f3497126ab1e" class="">Here’s the breakdown of the various methods according to the pass@1 scores and across four test-time compute budgets of \(N = [4,16,64, 256]\):</p><figure id="15b1384e-bcac-80ad-9cf3-cf5bcbd3f53b" class="image"><a href="https://huggingface.co/datasets/HuggingFaceH4/blogpost-images/resolve/main/levels-maj-bon-beam.png"><img style="width:707.9891357421875px" src="https://huggingface.co/datasets/HuggingFaceH4/blogpost-images/resolve/main/levels-maj-bon-beam.png"/></a></figure><p id="15d1384e-bcac-80c3-93b3-fa4c071ac807" class="">In this plot, each bar denotes a test-time compute budget, and within each bar we show the relative accuracy of each method. For example, in the group of four bars on difficulty level 2 we see that:</p>
|
238 |
|
239 |
<ul>
|