lightblue
/

suzume-llama-3-8B-multilingual-orpo-borda-half

Text Generation

Generated from Trainer

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

ptrdvn commited on May 30, 2024

Commit

f327725

·

verified ·

1 Parent(s): b6a178e

Update README.md

Files changed (1) hide show

README.md +8 -0

README.md CHANGED Viewed

@@ -24,6 +24,14 @@ Note that this model has a non-commerical license as we used the Command R and C
 We are currently working on a developing a commerically usable model, so stay tuned for that!
 # Model results
 We compare the MT-Bench scores across 6 languages for our 4 ORPO trained models, as well as some baselines:

 We are currently working on a developing a commerically usable model, so stay tuned for that!
+# Model list
+We have ORPO trained the following models using different proportions of the [lightblue/mitsu](https://huggingface.co/datasets/lightblue/mitsu) dataset:
+* Trained on the top/bottom responses of all prompts in the dataset: [lightblue/suzume-llama-3-8B-multilingual-orpo-borda-full](https://huggingface.co/lightblue/suzume-llama-3-8B-multilingual-orpo-borda-full)
+* Trained on the top/bottom responses of the prompts of the 75\% most consistently ranked responses in the dataset: [lightblue/suzume-llama-3-8B-multilingual-orpo-borda-top75](https://huggingface.co/lightblue/suzume-llama-3-8B-multilingual-orpo-borda-top75)
+* Trained on the top/bottom responses of the prompts of the 50\% most consistently ranked responses in the dataset: [lightblue/suzume-llama-3-8B-multilingual-orpo-borda-half](https://huggingface.co/lightblue/suzume-llama-3-8B-multilingual-orpo-borda-half)
+* Trained on the top/bottom responses of the prompts of the 25\% most consistently ranked responses in the dataset: [lightblue/suzume-llama-3-8B-multilingual-orpo-borda-top25](https://huggingface.co/lightblue/suzume-llama-3-8B-multilingual-orpo-borda-top25)
 # Model results
 We compare the MT-Bench scores across 6 languages for our 4 ORPO trained models, as well as some baselines: