Hyperparameters and data for this model?

by jwkirchenbauer - opened Dec 4, 2023

Discussion

jwkirchenbauer

Dec 4, 2023

Hi, I'm curious whether you can share the hyperparameters, training data, and preparation used for this model?

itsliupeng

Owner Dec 11, 2023

We're continuing the training of the Llama-2-7b-hf model using the mmlu_recall dataset, The training will run for one epoch, utilizing a batch size of 4 million tokens and maintaining a constant learning rate of 3e-5. This approach is designed to improve the model's performance on key mmlu/cmmlu metrics without negatively impacting other performance metrics.

itsliupeng changed discussion status to closed Dec 14, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment