arxiv:2509.12603

EconProver: Towards More Economical Test-Time Scaling for Automated Theorem Proving

Published on Sep 16

· Submitted by

Mukai Li on Sep 17

Upvote

Authors:

Mukai Li ,

Abstract

Two methods, dynamic CoT switching and Diverse parallel-scaled RL, reduce computational cost in ATP models while maintaining performance.

AI-generated summary

Large Language Models (LLMs) have recently advanced the field of Automated Theorem Proving (ATP), attaining substantial performance gains through widely adopted test-time scaling strategies, notably reflective Chain-of-Thought (CoT) reasoning and increased sampling passes. However, they both introduce significant computational overhead for inference. Moreover, existing cost analyses typically regulate only the number of sampling passes, while neglecting the substantial disparities in sampling costs introduced by different scaling strategies. In this paper, we systematically compare the efficiency of different test-time scaling strategies for ATP models and demonstrate the inefficiency of the current state-of-the-art (SOTA) open-source approaches. We then investigate approaches to significantly reduce token usage and sample passes while maintaining the original performance. Specifically, we propose two complementary methods that can be integrated into a unified EconRL pipeline for amplified benefits: (1) a dynamic Chain-of-Thought (CoT) switching mechanism designed to mitigate unnecessary token consumption, and (2) Diverse parallel-scaled reinforcement learning (RL) with trainable prefixes to enhance pass rates under constrained sampling passes. Experiments on miniF2F and ProofNet demonstrate that our EconProver achieves comparable performance to baseline methods with only 12% of the computational cost. This work provides actionable insights for deploying lightweight ATP models without sacrificing performance.

View arXiv page View PDF Add to collection

Community

kiaia

Paper author Paper submitter 5 days ago

🎓 Checkout our new: EconProver: Towards More Economical Test-Time Scaling for Automated Theorem Proving ✍️

🚀 Large Language Models (LLMs) are pushing boundaries in ATP via techniques like reflective CoT reasoning & high sample counts — but at steep inference costs.

⚙️ EconProver proposes a more ⚡ efficient pipeline:

Dynamic CoT switching → avoid unnecessary tokens.
Diverse parallel-scaled RL with trainable prefixes → get more “proofs per pass” under tight sampling constraints.

📊 On miniF2F & ProofNet: comparable baseline performance with only 12% of the computational cost.

🌟 Why it matters: allows lightweight ATP models to remain performant while cutting inference costs. Important for scaling theorem proving in resource-constrained settings.

librarian-bot

4 days ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2509.12603 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2509.12603 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2509.12603 in a Space README.md to link it from this page.