Model Card for Model ID

This is a Qwen2.5 0.5B Instruct model which got fine-tuned on a dataset generated by Monte Carlo Tree Search based sampling. MCTS was rolled out on a small subset of the GSM8K train split. The resulting traces & value estimates were then used to form the dataset. Only the last two transformer blocks and the regression head were unfroozen.

The idea is to use only the value network to do MCTS sampling, without the need of simulating/rolling out.

Currently the value network is overfitting, due to very limited samples. Going to update this soon, when I've sampled more data.

Scores on the first 65 samples of the gsm8k test-split:

  • Beam-search (3 beams): 40.0%
  • MCTS-search (3 beams): 50.77%

The final rollout of the MCTS-search is done also via Beam-serach. During testing on gsm8k, only the value network was used to guide the search.

All tests were done with Qwen2.5 0.5B Instruct.

Downloads last month
3
Safetensors
Model size
494M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for micaebe/Qwen2.5-0.5B-MCTS-Value-Net

Base model

Qwen/Qwen2.5-0.5B
Finetuned
(158)
this model