Synthetic sampling
Collection
Collection of models finetuned using various configurations of social surveys data for the synthetic sampling use case.
•
5 items
•
Updated
This model is a fine-tuned version of oxford-llms/llama3-1-ox-llms-8b-sft-full on the argilla/dpo-mix-7k dataset. It achieves the following results on the evaluation set:
More information needed
More information needed
More information needed
The following hyperparameters were used during training:
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.528 | 0.9479 | 100 | 0.5267 | -0.0255 | -0.6055 | 0.7604 | 0.5800 | -330.5430 | -320.6201 | -1.3169 | -1.3159 |
0.3731 | 1.8957 | 200 | 0.4821 | -0.2481 | -1.0900 | 0.7604 | 0.8419 | -340.2323 | -325.0733 | -1.3099 | -1.3082 |
Base model
meta-llama/Llama-3.1-8B