Corianas
/

Llama-3-Instruct-8B-SPPO-Iter3-4-64-GPTQ

Text Generation

UCLA-AGI/Llama-3-Instruct-8B-SPPO-Iter3

4-bit precision

Model card Files Files and versions Community

Corianas commited on Aug 25

Commit

9e4b4a4

•

1 Parent(s): 1ed5a3a

Update README.md

Files changed (1) hide show

README.md +2 -6

README.md CHANGED Viewed

@@ -1,4 +1,5 @@
 ---
 datasets:
 - openbmb/UltraFeedback
 language:
@@ -109,14 +110,9 @@ Self-Play Preference Optimization for Language Model Alignment (https://arxiv.or
 # Llama-3-Instruct-8B-SPPO-Iter3
-This model was developed using [Self-Play Preference Optimization](https://arxiv.org/abs/2405.00675) at iteration 3, based on the [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) architecture as starting point. We utilized the prompt sets from the [openbmb/UltraFeedback](https://huggingface.co/datasets/openbmb/UltraFeedback) dataset, splited to 3 parts for 3 iterations by [snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset](https://huggingface.co/datasets/snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset). All responses used are synthetic.
-## Links to Other Models
-- [Llama-3-Instruct-8B-SPPO-Iter1](https://huggingface.co/UCLA-AGI/Llama-3-Instruct-8B-SPPO-Iter1)
-- [Llama-3-Instruct-8B-SPPO-Iter2](https://huggingface.co/UCLA-AGI/Llama-3-Instruct-8B-SPPO-Iter2)
-- [Llama-3-Instruct-8B-SPPO-Iter3](https://huggingface.co/UCLA-AGI/Llama-3-Instruct-8B-SPPO-Iter3)
 ### Model Description
 - Model type: A 8B parameter GPT-like model fine-tuned on synthetic datasets.

 ---
+base_model: UCLA-AGI/Llama-3-Instruct-8B-SPPO-Iter3
 datasets:
 - openbmb/UltraFeedback
 language:
 # Llama-3-Instruct-8B-SPPO-Iter3
+This model is a GPTQ of the SPPO model developed using [Self-Play Preference Optimization](https://arxiv.org/abs/2405.00675) at iteration 3, based on the [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) architecture as starting point. We utilized the prompt sets from the [openbmb/UltraFeedback](https://huggingface.co/datasets/openbmb/UltraFeedback) dataset, splited to 3 parts for 3 iterations by [snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset](https://huggingface.co/datasets/snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset). All responses used are synthetic.
 ### Model Description
 - Model type: A 8B parameter GPT-like model fine-tuned on synthetic datasets.