Model Details

This is the final checkpoint of the OLMo 1B model pretrained on Algebraic Stack, FineMath3+, TinyGSM, OpenMathInstruct1, and OpenMathInstruct2 after performing PPO with GSM8K train.

Checkpoints are saved at the following timesteps:

rosieyzh/OLMo-1B-as_fm3_tg_omi1_omi2_base: Initial model after pretraining.
rosieyzh/OLMo-1B-as_fm3_tg_omi1_omi2_episode{1-9}: Saved after each epoch over GSM8K train.
rosieyzh/OLMo-1B-as_fm3_tg_omi1_omi2_global_step{9, 13, 18, 25, 36, 51, 73, 103, 146, 206, 291, 411, 581, 821}: Saved on a log scale across global steps (computed from [int(n) for n in np.logspace(-2.1, 0, 15) * 1160] ).

Note that the current model, rosieyzh/OLMo-1B-as_fm3_tg_omi1_omi2_ppo, is the final model after RLVR and equivalent to _episode10 and _globalstep1160.

Downloads last month: 4

Safetensors

Model size

1B params

Tensor type

F32

Collection including rosieyzh/OLMo-1B-as_fm3_tg_omi1_omi2_ppo

OLMo-1B-as_fm3_tg_omi1_omi2

Collection

OLMo 1B model pretrained with Algebraic Stack, FineMath3, TinyGSM, OMI1, and OMI2. Includes checkpoints from doing PPO using GSM8K train. • 25 items • Updated Jun 19 • 2