Model Details

This is the final checkpoint of the OLMo 1B model pretrained on Algebraic Stack, FineMath3+, TinyGSM, OpenMathInstruct1, and OpenMathInstruct2 after performing PPO with GSM8K train.

Checkpoints are saved at the following timesteps:

  • rosieyzh/OLMo-1B-as_fm3_tg_omi1_omi2_base: Initial model after pretraining.

  • rosieyzh/OLMo-1B-as_fm3_tg_omi1_omi2_episode{1-9}: Saved after each epoch over GSM8K train.

  • rosieyzh/OLMo-1B-as_fm3_tg_omi1_omi2_global_step{9, 13, 18, 25, 36, 51, 73, 103, 146, 206, 291, 411, 581, 821}: Saved on a log scale across global steps (computed from [int(n) for n in np.logspace(-2.1, 0, 15) * 1160] ).

Note that the current model, rosieyzh/OLMo-1B-as_fm3_tg_omi1_omi2_ppo, is the final model after RLVR and equivalent to _episode10 and _globalstep1160.

Downloads last month
4
Safetensors
Model size
1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including rosieyzh/OLMo-1B-as_fm3_tg_omi1_omi2_ppo