Model Details
This is the final checkpoint of the OLMo 1B model pretrained on Algebraic Stack, FineMath3+, TinyGSM, OpenMathInstruct1, and OpenMathInstruct2 after performing PPO with GSM8K train.
Checkpoints are saved at the following timesteps:
rosieyzh/OLMo-1B-as_fm3_tg_omi1_omi2_base: Initial model after pretraining.rosieyzh/OLMo-1B-as_fm3_tg_omi1_omi2_episode{1-9}: Saved after each epoch over GSM8K train.rosieyzh/OLMo-1B-as_fm3_tg_omi1_omi2_global_step{9, 13, 18, 25, 36, 51, 73, 103, 146, 206, 291, 411, 581, 821}: Saved on a log scale across global steps (computed from[int(n) for n in np.logspace(-2.1, 0, 15) * 1160]).
Note that the current model, rosieyzh/OLMo-1B-as_fm3_tg_omi1_omi2_ppo, is the final model after RLVR and equivalent to _episode10 and _globalstep1160.
- Downloads last month
- 4