File size: 3,191 Bytes
6c053f3 66e7346 6c053f3 cb65242 6c053f3 cb65242 66e7346 142cae5 cb65242 66e7346 6c053f3 66e7346 6c053f3 66e7346 6c053f3 66e7346 9c95ccd 66e7346 6c053f3 b561367 66e7346 6c053f3 66e7346 fc720f3 6362453 6c053f3 a074a16 6c053f3 66e7346 6c053f3 142cae5 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 |
---
license: mit
tags:
- alignment-handbook
- dpo
- trl
- selm
base_model: ZhangShenao/SELM-Llama-3-8B-Instruct-iter-2
datasets:
- HuggingFaceH4/ultrafeedback_binarized
model-index:
- name: SELM-Llama-3-8B-Instruct-iter-3
results: []
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
[Self-Exploring Language Models: Active Preference Elicitation for Online Alignment](https://arxiv.org/abs/2405.19332).
# SELM-Llama-3-8B-Instruct-iter-3
This model is a fine-tuned version of [ZhangShenao/SELM-Llama-3-8B-Instruct-iter-2](https://huggingface.co/ZhangShenao/SELM-Llama-3-8B-Instruct-iter-2) using synthetic data based on on the HuggingFaceH4/ultrafeedback_binarized dataset.
## Model description
- Model type: A 8B parameter Llama3-instruct-based Self-Exploring Language Models (SELM).
- License: MIT
## Results
| | AlpacaEval 2.0 (LC WR) | MT-Bench (Average) |
|----------------------------------------|------------------------|--------------------|
| [SELM-Llama-3-8B-Instruct-iter-3](https://huggingface.co/ZhangShenao/SELM-Llama-3-8B-Instruct-iter-3) |        33.47 |       8.29 |
| [SELM-Llama-3-8B-Instruct-iter-2](https://huggingface.co/ZhangShenao/SELM-Llama-3-8B-Instruct-iter-2) |        35.65 |       8.09 |
| [SELM-Llama-3-8B-Instruct-iter-1](https://huggingface.co/ZhangShenao/SELM-Llama-3-8B-Instruct-iter-1) |        32.02 |       7.92 |
| [Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) |        24.31 |       7.93 |
Our model also ranks highly on [WildBench](https://huggingface.co/spaces/allenai/WildBench)! 🔥
### Training hyperparameters
The following hyperparameters were used during training:
- alpha: 0.0001
- beta: 0.01
- train_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- gradient_accumulation_steps: 4
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- num_epochs: 1
### Framework versions
- Transformers 4.40.2
- Pytorch 2.1.2+cu121
- Datasets 2.14.6
- Tokenizers 0.19.1
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_ZhangShenao__SELM-Llama-3-8B-Instruct-iter-3)
| Metric |Value|
|-------------------|----:|
|Avg. |23.56|
|IFEval (0-Shot) |69.03|
|BBH (3-Shot) |29.08|
|MATH Lvl 5 (4-Shot)| 5.74|
|GPQA (0-shot) | 1.12|
|MuSR (0-shot) | 5.50|
|MMLU-PRO (5-shot) |30.92|
|