yuvraj17 commited on
Commit
ccdfe0e
1 Parent(s): bc6cbd8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +18 -3
README.md CHANGED
@@ -18,7 +18,7 @@ pipeline_tag: text-classification
18
 
19
  # Llama3-8B-SuperNova-Spectrum-dare_ties
20
 
21
- Llama3-8B-SuperNova-Spectrum-dare_ties is a `DARE_TIES` merge of the following models using [LazyMergekit](https://colab.research.google.com/drive/1obulZ1ROXHjYLn6PPZJwRR6GzgQogxxb?usp=sharing):
22
  * [yuvraj17/Llama-3-8B-spectrum-25](https://huggingface.co/yuvraj17/Llama-3-8B-spectrum-25)
23
  * [ruggsea/Llama3-stanford-encyclopedia-philosophy-QA](https://huggingface.co/ruggsea/Llama3-stanford-encyclopedia-philosophy-QA)
24
  * [arcee-ai/Llama-3.1-SuperNova-Lite](https://huggingface.co/arcee-ai/Llama-3.1-SuperNova-Lite)
@@ -31,6 +31,16 @@ Llama3-8B-SuperNova-Spectrum-dare_ties is a `DARE_TIES` merge of the following m
31
  * **Redundancy Removal**: Identifies and eliminates overlapping or unnecessary information between models, making the final model more efficient.
32
  * **Conflict Resolution**: Reconciles differences between models by creating a unified sign vector that represents the most dominant direction of change across all models.
33
 
 
 
 
 
 
 
 
 
 
 
34
  ### DARE Merging
35
 
36
  Introduced by Yu et al. (2023), [DARE](https://arxiv.org/abs/2311.03099) uses an approach similar to TIES with two main differences:
@@ -38,9 +48,13 @@ Introduced by Yu et al. (2023), [DARE](https://arxiv.org/abs/2311.03099) uses an
38
  * **Weight Pruning**: Randomly resets some fine-tuned weights to their original values, reducing model complexity.
39
  * **Weight Scaling**: Adjusts the remaining weights by scaling and combining them with the base model's weights to maintain consistent performance.
40
 
41
- Mergekit’s implementation of this method has two flavours: with the sign election step of TIES (`dare_ties`) or without (`dare_linear`).
 
 
42
 
43
  For more information refer this [Merge Large Language Models with MergeKit by Maxime Labonne](https://towardsdatascience.com/merge-large-language-models-with-mergekit-2118fb392b54)
 
 
44
 
45
  ## 🧩 Configuration
46
 
@@ -96,4 +110,5 @@ Coming soon
96
 
97
  ## Special thanks & Reference
98
  - Maxime Labonne for their easy-to-use colab-notebook [Merging LLMs with MergeKit](https://github.com/mlabonne/llm-course/blob/main/Mergekit.ipynb) and [Blog](https://towardsdatascience.com/merge-large-language-models-with-mergekit-2118fb392b54)
99
- - Authors of [Mergekit](https://github.com/arcee-ai/mergekit)
 
 
18
 
19
  # Llama3-8B-SuperNova-Spectrum-dare_ties
20
 
21
+ Llama3-8B-SuperNova-Spectrum-dare_ties is a `dare_ties` merge of the following models using [LazyMergekit](https://colab.research.google.com/drive/1obulZ1ROXHjYLn6PPZJwRR6GzgQogxxb?usp=sharing):
22
  * [yuvraj17/Llama-3-8B-spectrum-25](https://huggingface.co/yuvraj17/Llama-3-8B-spectrum-25)
23
  * [ruggsea/Llama3-stanford-encyclopedia-philosophy-QA](https://huggingface.co/ruggsea/Llama3-stanford-encyclopedia-philosophy-QA)
24
  * [arcee-ai/Llama-3.1-SuperNova-Lite](https://huggingface.co/arcee-ai/Llama-3.1-SuperNova-Lite)
 
31
  * **Redundancy Removal**: Identifies and eliminates overlapping or unnecessary information between models, making the final model more efficient.
32
  * **Conflict Resolution**: Reconciles differences between models by creating a unified sign vector that represents the most dominant direction of change across all models.
33
 
34
+ **TRIES** stands for **T**R**I**M, **E**LECT **S**IGN & MERGE (TIES-MERGING).
35
+
36
+ <figure>
37
+
38
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/66137d95e8d2cda230ddcea6/2vBgcGko-tcsaAkLUzHnU.png" width="1000" height="768">
39
+ <figcaption> How TIES-Merging Works <a href="//arxiv.org/pdf/2306.01708">Reference</a> </figcaption>
40
+
41
+ </figure>
42
+
43
+
44
  ### DARE Merging
45
 
46
  Introduced by Yu et al. (2023), [DARE](https://arxiv.org/abs/2311.03099) uses an approach similar to TIES with two main differences:
 
48
  * **Weight Pruning**: Randomly resets some fine-tuned weights to their original values, reducing model complexity.
49
  * **Weight Scaling**: Adjusts the remaining weights by scaling and combining them with the base model's weights to maintain consistent performance.
50
 
51
+ **DARE** stands for **D**ROP **A**ND **RE**SCALE
52
+
53
+ Mergekit’s implementation of DARE-Merging has two flavours: with the sign election step of TIES (`dare_ties`) or without (`dare_linear`). I have chosen `dare_ties` for this merge.
54
 
55
  For more information refer this [Merge Large Language Models with MergeKit by Maxime Labonne](https://towardsdatascience.com/merge-large-language-models-with-mergekit-2118fb392b54)
56
+
57
+ Also, if you want to get in-depth knowledge about Model-Merging and its different types, I highly recommend this [YouTube Video by Julien Simon](https://youtu.be/cvOpX75Kz4M?si=d5crVWSxcjvNUm6a)
58
 
59
  ## 🧩 Configuration
60
 
 
110
 
111
  ## Special thanks & Reference
112
  - Maxime Labonne for their easy-to-use colab-notebook [Merging LLMs with MergeKit](https://github.com/mlabonne/llm-course/blob/main/Mergekit.ipynb) and [Blog](https://towardsdatascience.com/merge-large-language-models-with-mergekit-2118fb392b54)
113
+ - Authors of [Mergekit](https://github.com/arcee-ai/mergekit)
114
+ -