follow-reward-model-test / last-checkpoint
penglingwei's picture
Training in progress, step 500, checkpoint
61439ed verified