blogpost-scaling-test-time-compute

Running

App Files Files Community

lewtun HF staff commited on 2 days ago

Commit

3ceb404

•

1 Parent(s): 2f11f71

Add reference to AlphaMath

Browse files

Files changed (1) hide show

app/src/index.html +1 -1

app/src/index.html CHANGED Viewed

@@ -100,7 +100,7 @@
     <ul>
         <li><strong>Best-of-N: </strong>Generate multiple responses per problem and assign scores to each candidate answer, typically using a reward model. Then select the answer with the highest reward (or a weighted variant discussed later). This approach emphasizes answer quality over frequency.</li>
         <li><strong>Beam search: </strong>A systematic search method that explores the solution space, often combined with a <em>process reward model (PRM)</em><d-cite key="prm"></d-cite> to optimise both the sampling and evaluation of intermediate steps in problem-solving. Unlike conventional reward models that produce a single score on the final answer, PRMs provide a <em>sequence </em>of scores, one for each step of the reasoning process. This ability to provide fine-grained feedback makes PRMs a natural fit for search methods with LLMs.</li>
-        <li><strong>Diverse verifier tree search (DVTS):</strong> An extension of beam search we developed that splits the initial beams into independent subtrees, which are then expanded greedily using a PRM.<d-footnote>DVTS is similar to <a href="https://huggingface.co/papers/1610.02424">diverse beam search (DBS)</a> with the main difference that beams share a common prefix in DBS and no sampling is used. DVTS is also similar to <a href="https://huggingface.co/papers/2306.09896">code repair trees</a>, although it is not restricted to code generation models and discrete verifiers.</d-footnote>  This method improves solution diversity and overall performance, particularly with larger test-time compute budgets.</li>
     </ul>
     <p id="15a1384e-bcac-803c-bc89-ed15f18eafdc" class="">With an understanding of the key search strategies, let’s move on to how we evaluated them in practice.</p>

     <ul>
         <li><strong>Best-of-N: </strong>Generate multiple responses per problem and assign scores to each candidate answer, typically using a reward model. Then select the answer with the highest reward (or a weighted variant discussed later). This approach emphasizes answer quality over frequency.</li>
         <li><strong>Beam search: </strong>A systematic search method that explores the solution space, often combined with a <em>process reward model (PRM)</em><d-cite key="prm"></d-cite> to optimise both the sampling and evaluation of intermediate steps in problem-solving. Unlike conventional reward models that produce a single score on the final answer, PRMs provide a <em>sequence </em>of scores, one for each step of the reasoning process. This ability to provide fine-grained feedback makes PRMs a natural fit for search methods with LLMs.</li>
+        <li><strong>Diverse verifier tree search (DVTS):</strong> An extension of beam search we developed that splits the initial beams into independent subtrees, which are then expanded greedily using a PRM.<d-footnote>DVTS is similar to <a href="https://huggingface.co/papers/1610.02424">diverse beam search (DBS)</a> with the main difference that beams share a common prefix in DBS and no sampling is used. DVTS is also similar to <a href="https://huggingface.co/papers/2306.09896">code repair trees</a>, although it is not restricted to code generation models and discrete verifiers. After the publication of this blog post, we were made aware of <a href="https://huggingface.co/papers/2405.03553">step-level beam search</a>, which is most similar to DVTS and uses a value head to predict the most promising steps instead of a PRM.</d-footnote>  This method improves solution diversity and overall performance, particularly with larger test-time compute budgets.</li>
     </ul>
     <p id="15a1384e-bcac-803c-bc89-ed15f18eafdc" class="">With an understanding of the key search strategies, let’s move on to how we evaluated them in practice.</p>