Corianas commited on
Commit
9e4b4a4
1 Parent(s): 1ed5a3a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -6
README.md CHANGED
@@ -1,4 +1,5 @@
1
  ---
 
2
  datasets:
3
  - openbmb/UltraFeedback
4
  language:
@@ -109,14 +110,9 @@ Self-Play Preference Optimization for Language Model Alignment (https://arxiv.or
109
 
110
  # Llama-3-Instruct-8B-SPPO-Iter3
111
 
112
- This model was developed using [Self-Play Preference Optimization](https://arxiv.org/abs/2405.00675) at iteration 3, based on the [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) architecture as starting point. We utilized the prompt sets from the [openbmb/UltraFeedback](https://huggingface.co/datasets/openbmb/UltraFeedback) dataset, splited to 3 parts for 3 iterations by [snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset](https://huggingface.co/datasets/snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset). All responses used are synthetic.
113
 
114
 
115
- ## Links to Other Models
116
- - [Llama-3-Instruct-8B-SPPO-Iter1](https://huggingface.co/UCLA-AGI/Llama-3-Instruct-8B-SPPO-Iter1)
117
- - [Llama-3-Instruct-8B-SPPO-Iter2](https://huggingface.co/UCLA-AGI/Llama-3-Instruct-8B-SPPO-Iter2)
118
- - [Llama-3-Instruct-8B-SPPO-Iter3](https://huggingface.co/UCLA-AGI/Llama-3-Instruct-8B-SPPO-Iter3)
119
-
120
  ### Model Description
121
 
122
  - Model type: A 8B parameter GPT-like model fine-tuned on synthetic datasets.
 
1
  ---
2
+ base_model: UCLA-AGI/Llama-3-Instruct-8B-SPPO-Iter3
3
  datasets:
4
  - openbmb/UltraFeedback
5
  language:
 
110
 
111
  # Llama-3-Instruct-8B-SPPO-Iter3
112
 
113
+ This model is a GPTQ of the SPPO model developed using [Self-Play Preference Optimization](https://arxiv.org/abs/2405.00675) at iteration 3, based on the [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) architecture as starting point. We utilized the prompt sets from the [openbmb/UltraFeedback](https://huggingface.co/datasets/openbmb/UltraFeedback) dataset, splited to 3 parts for 3 iterations by [snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset](https://huggingface.co/datasets/snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset). All responses used are synthetic.
114
 
115
 
 
 
 
 
 
116
  ### Model Description
117
 
118
  - Model type: A 8B parameter GPT-like model fine-tuned on synthetic datasets.