Hyperparameters and data for this model?

#2
by jwkirchenbauer - opened

Hi, I'm curious whether you can share the hyperparameters, training data, and preparation used for this model?

We're continuing the training of the Llama-2-7b-hf model using the mmlu_recall dataset, The training will run for one epoch, utilizing a batch size of 4 million tokens and maintaining a constant learning rate of 3e-5. This approach is designed to improve the model's performance on key mmlu/cmmlu metrics without negatively impacting other performance metrics.

itsliupeng changed discussion status to closed

Sign up or log in to comment