File size: 7,775 Bytes
83c8851 fbd1559 1a56ff5 83c8851 c7018ef 83c8851 3ce28fc 0e9343a 3ce28fc a003286 3ce28fc a003286 3ce28fc 83c8851 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 |
---
license: apache-2.0
tags:
- moe
---
![](https://i.imgur.com/vq1QHEA.jpg)
# Beyonder-4x7B-v2
This model is a Mixture of Experts (MoE) made with [mergekit](https://github.com/cg123/mergekit) (mixtral branch). It uses the following base models:
* [openchat/openchat-3.5-1210](https://huggingface.co/openchat/openchat-3.5-1210)
* [beowolx/CodeNinja-1.0-OpenChat-7B](https://huggingface.co/beowolx/CodeNinja-1.0-OpenChat-7B)
* [maywell/PiVoT-0.1-Starling-LM-RP](https://huggingface.co/maywell/PiVoT-0.1-Starling-LM-RP)
* [WizardLM/WizardMath-7B-V1.1](https://huggingface.co/WizardLM/WizardMath-7B-V1.1)
## 🏆 Evaluation
Beyonder-4x7B-v2 is competitive with Mixtral-8x7B-Instruct-v0.1 on the Open LLM Leaderboard, while only having 4 experts instead of 8.
![](https://i.imgur.com/5raBff0.png)
It also displays a significant improvement over the individual experts.
![](https://i.imgur.com/7Idwkb0.png)
It also performs very well compared to other models on Nous benchmark suite. It's almost as good as the best Yi-34B fine-tune, which is a much bigger model: 24.2B parameters + only two experts are selected during inference (so ~12B) vs. 34B param.
| Model |AGIEval|GPT4All|TruthfulQA|Bigbench|Average|
|--------------------------------------------------------------------|------:|------:|---------:|-------:|------:|
|[**Beyonder-4x7B-v2**](https://huggingface.co/shadowml/Beyonder-4x7B-v2)| **45.29**| **75.95**| <u>**60.86**</u>| **46.4**| **57.13**|
|[NeuralHermes-2.5-Mistral-7B](https://huggingface.co/mlabonne/NeuralHermes-2.5-Mistral-7B)| 43.67| 73.24| 55.37| 41.76| 53.51|
|[OpenHermes-2.5-Mistral-7B](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B)| 42.75| 72.99| 52.99| 40.94| 52.42|
|[Nous-Hermes-2-SOLAR-10.7B](https://huggingface.co/NousResearch/Nous-Hermes-2-SOLAR-10.7B)| 47.79| 74.69| 55.92| 44.84| 55.81|
|[Nous-Hermes-2-Yi-34B](https://huggingface.co/NousResearch/Nous-Hermes-2-SOLAR-10.7B)| <u>50.27</u>| <u>76.00</u>| 60.34| <u>46.69</u>| <u>58.33</u>|
### AGIEval
| Task |Version| Metric |Value| |Stderr|
|------------------------------|------:|--------|----:|---|-----:|
|agieval_aqua_rat | 0|acc |23.62|± | 2.67|
| | |acc_norm|23.62|± | 2.67|
|agieval_logiqa_en | 0|acc |41.47|± | 1.93|
| | |acc_norm|43.01|± | 1.94|
|agieval_lsat_ar | 0|acc |23.04|± | 2.78|
| | |acc_norm|23.48|± | 2.80|
|agieval_lsat_lr | 0|acc |51.57|± | 2.22|
| | |acc_norm|52.94|± | 2.21|
|agieval_lsat_rc | 0|acc |64.31|± | 2.93|
| | |acc_norm|64.68|± | 2.92|
|agieval_sat_en | 0|acc |79.13|± | 2.84|
| | |acc_norm|79.13|± | 2.84|
|agieval_sat_en_without_passage| 0|acc |43.20|± | 3.46|
| | |acc_norm|43.20|± | 3.46|
|agieval_sat_math | 0|acc |34.55|± | 3.21|
| | |acc_norm|32.27|± | 3.16|
### GPT4All
| Task |Version| Metric |Value| |Stderr|
|-------------|------:|--------|----:|---|-----:|
|arc_challenge| 0|acc |61.86|± | 1.42|
| | |acc_norm|64.51|± | 1.40|
|arc_easy | 0|acc |85.06|± | 0.73|
| | |acc_norm|82.45|± | 0.78|
|boolq | 1|acc |88.35|± | 0.56|
|hellaswag | 0|acc |68.04|± | 0.47|
| | |acc_norm|85.12|± | 0.36|
|openbookqa | 0|acc |37.80|± | 2.17|
| | |acc_norm|48.60|± | 2.24|
|piqa | 0|acc |83.08|± | 0.87|
| | |acc_norm|83.95|± | 0.86|
|winogrande | 0|acc |78.69|± | 1.15|
### TruthfulQA
| Task |Version|Metric|Value| |Stderr|
|-------------|------:|------|----:|---|-----:|
|truthfulqa_mc| 1|mc1 |44.55|± | 1.74|
| | |mc2 |60.86|± | 1.57|
### Bigbench
| Task |Version| Metric |Value| |Stderr|
|------------------------------------------------|------:|---------------------|----:|---|-----:|
|bigbench_causal_judgement | 0|multiple_choice_grade|58.95|± | 3.58|
|bigbench_date_understanding | 0|multiple_choice_grade|66.40|± | 2.46|
|bigbench_disambiguation_qa | 0|multiple_choice_grade|48.84|± | 3.12|
|bigbench_geometric_shapes | 0|multiple_choice_grade|22.56|± | 2.21|
| | |exact_str_match |13.37|± | 1.80|
|bigbench_logical_deduction_five_objects | 0|multiple_choice_grade|30.40|± | 2.06|
|bigbench_logical_deduction_seven_objects | 0|multiple_choice_grade|20.57|± | 1.53|
|bigbench_logical_deduction_three_objects | 0|multiple_choice_grade|52.00|± | 2.89|
|bigbench_movie_recommendation | 0|multiple_choice_grade|44.40|± | 2.22|
|bigbench_navigate | 0|multiple_choice_grade|52.10|± | 1.58|
|bigbench_reasoning_about_colored_objects | 0|multiple_choice_grade|69.75|± | 1.03|
|bigbench_ruin_names | 0|multiple_choice_grade|55.36|± | 2.35|
|bigbench_salient_translation_error_detection | 0|multiple_choice_grade|23.65|± | 1.35|
|bigbench_snarks | 0|multiple_choice_grade|77.35|± | 3.12|
|bigbench_sports_understanding | 0|multiple_choice_grade|73.02|± | 1.41|
|bigbench_temporal_sequences | 0|multiple_choice_grade|46.80|± | 1.58|
|bigbench_tracking_shuffled_objects_five_objects | 0|multiple_choice_grade|22.08|± | 1.17|
|bigbench_tracking_shuffled_objects_seven_objects| 0|multiple_choice_grade|19.03|± | 0.94|
|bigbench_tracking_shuffled_objects_three_objects| 0|multiple_choice_grade|52.00|± | 2.89|
## 🧩 Configuration
```yaml
base_model: mlabonne/Marcoro14-7B-slerp
experts:
- source_model: openchat/openchat-3.5-1210
positive_prompts:
- "chat"
- "assistant"
- "tell me"
- "explain"
- source_model: beowolx/CodeNinja-1.0-OpenChat-7B
positive_prompts:
- "code"
- "python"
- "javascript"
- "programming"
- "algorithm"
- source_model: maywell/PiVoT-0.1-Starling-LM-RP
positive_prompts:
- "storywriting"
- "write"
- "scene"
- "story"
- "character"
- source_model: WizardLM/WizardMath-7B-V1.1
positive_prompts:
- "reason"
- "math"
- "mathematics"
- "solve"
- "count"
```
## 💻 Usage
```python
!pip install -qU transformers bitsandbytes accelerate
from transformers import AutoTokenizer
import transformers
import torch
model = "mlabonne/Beyonder-4x7B-v2"
tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
"text-generation",
model=model,
model_kwargs={"torch_dtype": torch.float16, "load_in_4bit": True},
)
messages = [{"role": "user", "content": "Explain what a Mixture of Experts is in less than 100 words."}]
prompt = pipeline.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
``` |