princeton-nlp/llama3-ultrafeedback
Viewer • Updated • 61.8k • 400 • 18
This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the princeton-nlp/llama3-ultrafeedback dataset. It achieves the following results on the evaluation set:
More information needed
More information needed
More information needed
The following hyperparameters were used during training:
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1.2571 | 0.8550 | 400 | 1.2552 | -0.5792 | -0.7270 | 0.6626 | 0.1478 | -0.7270 | -0.5792 | -0.3939 | -0.3830 |
Base model
meta-llama/Meta-Llama-3-8B-Instruct