Hyperparameters and data for this model?
#2
by
jwkirchenbauer
- opened
Hi, I'm curious whether you can share the hyperparameters, training data, and preparation used for this model?
We're continuing the training of the Llama-2-7b-hf model using the mmlu_recall dataset, The training will run for one epoch, utilizing a batch size of 4 million tokens and maintaining a constant learning rate of 3e-5. This approach is designed to improve the model's performance on key mmlu/cmmlu metrics without negatively impacting other performance metrics.
itsliupeng
changed discussion status to
closed