HiroseKoichi
/

Llama-3-8B-Stroganoff-4.0

 ---
 # Llama-3-8B-Stroganoff-4.0-Version-B
+Since V3, I tested a lot of old models, looked at some new ones, and used every merge method available in mergekit. There's two versions of 4.0: A and B. Version A has better writing, and version B has better instruction following, but they're both very similar; I personally prefer version A. Both are from experiments I was doing on model order, which is why all the models use the same parameters, but they were good enough that I decided to upload them. If you've been doing merges yourself, then most or all of the following information will be redundant, but some of it was not at all apparent to me, so I hope it will help others looking for more information.
+Ties is not better than Task-Arithmetic, and Task-Arithmetic is not better than Ties; they both have certain advantages that make them better in different situations. Ties aims to reduce model interference by keeping weights that agree with each other and zeroing out the rest. If you try to use Ties with a bunch of models that do different things, then some aspects of the models might get erased if it doesn't have a strong enough presence. The order of the models does not matter with a Ties merge because all of the merging happens in one step, and changing the model order will produce identical hashes, assuming you're not using Dare or Della, which adds randomness to the merge.
+Task-Arithmetic is a linear merge that first subtracts the base model from the fine-tuned models and then merges them in pairs starting at the top of the list before finally merging the result back on top of the base model. The order of the models does matter with a Task-Arithmetic merge, and changing the model order will produce different hashes. A Task-Arithmetic merge keeps more of the individuality of the component models, with the last to be merged having the strongest effect on the resulting model. Task-Arithmetic can be unpredictable at times, as changing the order of the models can produce significantly different results, but it can be effective at combining the strengths of different models once you find the right order.
+Dare, Della, and Breadcrumbs are all enhancements to Ties and Task-Arithmetic that aim to improve the resulting merge by zeroing out certain weights. While they all remove weights before merging takes place, they each do it a bit differently. Dare assigns a flat dropout rate, meaning all weights have an equal chance of being dropped; Della scales the dropout rate based on the magnitude of change from the base model, with the largest changes having the smallest dropout rate; and Breadcrumbs first removes any outliers and then begins zeroing out weights until it reaches the target density, starting with the smallest changes. I've done direct comparisons between Dare and Della with all the same parameters, and Della has consistently outperformed Dare. I haven't tested breadcrumbs much, but the idea behind it seems solid.
 # Details
 - **License**: [llama3](https://llama.meta.com/llama3/license/)