blogpost-scaling-test-time-compute

Running

App Files Files Community

lewtun HF staff commited on 2 days ago

Commit

2f11f71

•

1 Parent(s): f079c42

Fix href

Browse files

Files changed (1) hide show

app/src/index.html +1 -1

app/src/index.html CHANGED Viewed

@@ -61,7 +61,7 @@
     </d-contents>
     <!-- INTRODUCTION -->
-    <p>Over the last few years, the scaling of <em><strong>train-time compute</strong></em> has dominated the progress of large language models (LLMs).<d-footnote>Here, train-time compute refers to increasing model size, dataset size, and compute budgets in line with <a href="https://huggingface.co/papers/2001.08361">scaling laws.</d-footnote>Although this paradigm has proven to be remarkably effective, the  resources needed to pretrain ever larger models are becoming prohibitively expensive, with <a href="https://youtu.be/WXhikNA5PIc?feature=shared">billion-dollar clusters</a> already on the horizon.<d-footnote>Aside from compute resources, Ilya Sutskever has made the <a href="https://www.youtube.com/watch?feature=shared&t=475&v=1yvBqasHLZs">provocative analogy</a> that pretraining data is the “fossil fuel of AI” and that pretraining as we know it will end once this resource is exhausted in the near future.</d-footnote> This trend has sparked significant interest in a complementary approach: <em><strong>test-time compute scaling</strong></em>. Rather than relying on ever-larger pretraining budgets, test-time methods use dynamic inference strategies that allow models to “think longer” on harder problems. A prominent example is <a href="https://openai.com/index/learning-to-reason-with-llms/">OpenAI’s o1 model</a>, which shows consistent improvement on difficult math problems as one increases the amount of test-time compute:</p>
     <figure id="1581384e-bcac-805f-8c2b-dff4509f45cb" class="image"><a href="https://huggingface.co/datasets/HuggingFaceH4/blogpost-images/resolve/main/compute.png.webp"><img style="width:672px" src="https://huggingface.co/datasets/HuggingFaceH4/blogpost-images/resolve/main/compute.png.webp"/></a></figure>

     </d-contents>
     <!-- INTRODUCTION -->
+    <p>Over the last few years, the scaling of <em><strong>train-time compute</strong></em> has dominated the progress of large language models (LLMs).<d-footnote>Here, train-time compute refers to increasing model size, dataset size, and compute budgets in line with <a href="https://huggingface.co/papers/2001.08361">scaling laws.</a></d-footnote>Although this paradigm has proven to be remarkably effective, the  resources needed to pretrain ever larger models are becoming prohibitively expensive, with <a href="https://youtu.be/WXhikNA5PIc?feature=shared">billion-dollar clusters</a> already on the horizon.<d-footnote>Aside from compute resources, Ilya Sutskever has made the <a href="https://www.youtube.com/watch?feature=shared&t=475&v=1yvBqasHLZs">provocative analogy</a> that pretraining data is the “fossil fuel of AI” and that pretraining as we know it will end once this resource is exhausted in the near future.</d-footnote> This trend has sparked significant interest in a complementary approach: <em><strong>test-time compute scaling</strong></em>. Rather than relying on ever-larger pretraining budgets, test-time methods use dynamic inference strategies that allow models to “think longer” on harder problems. A prominent example is <a href="https://openai.com/index/learning-to-reason-with-llms/">OpenAI’s o1 model</a>, which shows consistent improvement on difficult math problems as one increases the amount of test-time compute:</p>
     <figure id="1581384e-bcac-805f-8c2b-dff4509f45cb" class="image"><a href="https://huggingface.co/datasets/HuggingFaceH4/blogpost-images/resolve/main/compute.png.webp"><img style="width:672px" src="https://huggingface.co/datasets/HuggingFaceH4/blogpost-images/resolve/main/compute.png.webp"/></a></figure>