Update README.md
Browse files
README.md
CHANGED
@@ -18,7 +18,7 @@ pipeline_tag: text-classification
|
|
18 |
|
19 |
# Llama3-8B-SuperNova-Spectrum-dare_ties
|
20 |
|
21 |
-
Llama3-8B-SuperNova-Spectrum-dare_ties is a `
|
22 |
* [yuvraj17/Llama-3-8B-spectrum-25](https://huggingface.co/yuvraj17/Llama-3-8B-spectrum-25)
|
23 |
* [ruggsea/Llama3-stanford-encyclopedia-philosophy-QA](https://huggingface.co/ruggsea/Llama3-stanford-encyclopedia-philosophy-QA)
|
24 |
* [arcee-ai/Llama-3.1-SuperNova-Lite](https://huggingface.co/arcee-ai/Llama-3.1-SuperNova-Lite)
|
@@ -31,6 +31,16 @@ Llama3-8B-SuperNova-Spectrum-dare_ties is a `DARE_TIES` merge of the following m
|
|
31 |
* **Redundancy Removal**: Identifies and eliminates overlapping or unnecessary information between models, making the final model more efficient.
|
32 |
* **Conflict Resolution**: Reconciles differences between models by creating a unified sign vector that represents the most dominant direction of change across all models.
|
33 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
34 |
### DARE Merging
|
35 |
|
36 |
Introduced by Yu et al. (2023), [DARE](https://arxiv.org/abs/2311.03099) uses an approach similar to TIES with two main differences:
|
@@ -38,9 +48,13 @@ Introduced by Yu et al. (2023), [DARE](https://arxiv.org/abs/2311.03099) uses an
|
|
38 |
* **Weight Pruning**: Randomly resets some fine-tuned weights to their original values, reducing model complexity.
|
39 |
* **Weight Scaling**: Adjusts the remaining weights by scaling and combining them with the base model's weights to maintain consistent performance.
|
40 |
|
41 |
-
|
|
|
|
|
42 |
|
43 |
For more information refer this [Merge Large Language Models with MergeKit by Maxime Labonne](https://towardsdatascience.com/merge-large-language-models-with-mergekit-2118fb392b54)
|
|
|
|
|
44 |
|
45 |
## 🧩 Configuration
|
46 |
|
@@ -96,4 +110,5 @@ Coming soon
|
|
96 |
|
97 |
## Special thanks & Reference
|
98 |
- Maxime Labonne for their easy-to-use colab-notebook [Merging LLMs with MergeKit](https://github.com/mlabonne/llm-course/blob/main/Mergekit.ipynb) and [Blog](https://towardsdatascience.com/merge-large-language-models-with-mergekit-2118fb392b54)
|
99 |
-
- Authors of [Mergekit](https://github.com/arcee-ai/mergekit)
|
|
|
|
18 |
|
19 |
# Llama3-8B-SuperNova-Spectrum-dare_ties
|
20 |
|
21 |
+
Llama3-8B-SuperNova-Spectrum-dare_ties is a `dare_ties` merge of the following models using [LazyMergekit](https://colab.research.google.com/drive/1obulZ1ROXHjYLn6PPZJwRR6GzgQogxxb?usp=sharing):
|
22 |
* [yuvraj17/Llama-3-8B-spectrum-25](https://huggingface.co/yuvraj17/Llama-3-8B-spectrum-25)
|
23 |
* [ruggsea/Llama3-stanford-encyclopedia-philosophy-QA](https://huggingface.co/ruggsea/Llama3-stanford-encyclopedia-philosophy-QA)
|
24 |
* [arcee-ai/Llama-3.1-SuperNova-Lite](https://huggingface.co/arcee-ai/Llama-3.1-SuperNova-Lite)
|
|
|
31 |
* **Redundancy Removal**: Identifies and eliminates overlapping or unnecessary information between models, making the final model more efficient.
|
32 |
* **Conflict Resolution**: Reconciles differences between models by creating a unified sign vector that represents the most dominant direction of change across all models.
|
33 |
|
34 |
+
**TRIES** stands for **T**R**I**M, **E**LECT **S**IGN & MERGE (TIES-MERGING).
|
35 |
+
|
36 |
+
<figure>
|
37 |
+
|
38 |
+
<img src="https://cdn-uploads.huggingface.co/production/uploads/66137d95e8d2cda230ddcea6/2vBgcGko-tcsaAkLUzHnU.png" width="1000" height="768">
|
39 |
+
<figcaption> How TIES-Merging Works <a href="//arxiv.org/pdf/2306.01708">Reference</a> </figcaption>
|
40 |
+
|
41 |
+
</figure>
|
42 |
+
|
43 |
+
|
44 |
### DARE Merging
|
45 |
|
46 |
Introduced by Yu et al. (2023), [DARE](https://arxiv.org/abs/2311.03099) uses an approach similar to TIES with two main differences:
|
|
|
48 |
* **Weight Pruning**: Randomly resets some fine-tuned weights to their original values, reducing model complexity.
|
49 |
* **Weight Scaling**: Adjusts the remaining weights by scaling and combining them with the base model's weights to maintain consistent performance.
|
50 |
|
51 |
+
**DARE** stands for **D**ROP **A**ND **RE**SCALE
|
52 |
+
|
53 |
+
Mergekit’s implementation of DARE-Merging has two flavours: with the sign election step of TIES (`dare_ties`) or without (`dare_linear`). I have chosen `dare_ties` for this merge.
|
54 |
|
55 |
For more information refer this [Merge Large Language Models with MergeKit by Maxime Labonne](https://towardsdatascience.com/merge-large-language-models-with-mergekit-2118fb392b54)
|
56 |
+
|
57 |
+
Also, if you want to get in-depth knowledge about Model-Merging and its different types, I highly recommend this [YouTube Video by Julien Simon](https://youtu.be/cvOpX75Kz4M?si=d5crVWSxcjvNUm6a)
|
58 |
|
59 |
## 🧩 Configuration
|
60 |
|
|
|
110 |
|
111 |
## Special thanks & Reference
|
112 |
- Maxime Labonne for their easy-to-use colab-notebook [Merging LLMs with MergeKit](https://github.com/mlabonne/llm-course/blob/main/Mergekit.ipynb) and [Blog](https://towardsdatascience.com/merge-large-language-models-with-mergekit-2118fb392b54)
|
113 |
+
- Authors of [Mergekit](https://github.com/arcee-ai/mergekit)
|
114 |
+
-
|