Add reference to AlphaMath
Browse files- app/src/index.html +1 -1
app/src/index.html
CHANGED
@@ -100,7 +100,7 @@
|
|
100 |
<ul>
|
101 |
<li><strong>Best-of-N: </strong>Generate multiple responses per problem and assign scores to each candidate answer, typically using a reward model. Then select the answer with the highest reward (or a weighted variant discussed later). This approach emphasizes answer quality over frequency.</li>
|
102 |
<li><strong>Beam search: </strong>A systematic search method that explores the solution space, often combined with a <em>process reward model (PRM)</em><d-cite key="prm"></d-cite> to optimise both the sampling and evaluation of intermediate steps in problem-solving. Unlike conventional reward models that produce a single score on the final answer, PRMs provide a <em>sequence </em>of scores, one for each step of the reasoning process. This ability to provide fine-grained feedback makes PRMs a natural fit for search methods with LLMs.</li>
|
103 |
-
<li><strong>Diverse verifier tree search (DVTS):</strong> An extension of beam search we developed that splits the initial beams into independent subtrees, which are then expanded greedily using a PRM.<d-footnote>DVTS is similar to <a href="https://huggingface.co/papers/1610.02424">diverse beam search (DBS)</a> with the main difference that beams share a common prefix in DBS and no sampling is used. DVTS is also similar to <a href="https://huggingface.co/papers/2306.09896">code repair trees</a>, although it is not restricted to code generation models and discrete verifiers.</d-footnote> This method improves solution diversity and overall performance, particularly with larger test-time compute budgets.</li>
|
104 |
</ul>
|
105 |
|
106 |
<p id="15a1384e-bcac-803c-bc89-ed15f18eafdc" class="">With an understanding of the key search strategies, let’s move on to how we evaluated them in practice.</p>
|
|
|
100 |
<ul>
|
101 |
<li><strong>Best-of-N: </strong>Generate multiple responses per problem and assign scores to each candidate answer, typically using a reward model. Then select the answer with the highest reward (or a weighted variant discussed later). This approach emphasizes answer quality over frequency.</li>
|
102 |
<li><strong>Beam search: </strong>A systematic search method that explores the solution space, often combined with a <em>process reward model (PRM)</em><d-cite key="prm"></d-cite> to optimise both the sampling and evaluation of intermediate steps in problem-solving. Unlike conventional reward models that produce a single score on the final answer, PRMs provide a <em>sequence </em>of scores, one for each step of the reasoning process. This ability to provide fine-grained feedback makes PRMs a natural fit for search methods with LLMs.</li>
|
103 |
+
<li><strong>Diverse verifier tree search (DVTS):</strong> An extension of beam search we developed that splits the initial beams into independent subtrees, which are then expanded greedily using a PRM.<d-footnote>DVTS is similar to <a href="https://huggingface.co/papers/1610.02424">diverse beam search (DBS)</a> with the main difference that beams share a common prefix in DBS and no sampling is used. DVTS is also similar to <a href="https://huggingface.co/papers/2306.09896">code repair trees</a>, although it is not restricted to code generation models and discrete verifiers. After the publication of this blog post, we were made aware of <a href="https://huggingface.co/papers/2405.03553">step-level beam search</a>, which is most similar to DVTS and uses a value head to predict the most promising steps instead of a PRM.</d-footnote> This method improves solution diversity and overall performance, particularly with larger test-time compute budgets.</li>
|
104 |
</ul>
|
105 |
|
106 |
<p id="15a1384e-bcac-803c-bc89-ed15f18eafdc" class="">With an understanding of the key search strategies, let’s move on to how we evaluated them in practice.</p>
|