sometimesanotion
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -29,6 +29,8 @@ Lamarck 14B v0.6: A generalist merge focused on multi-step reasoning, prose, mu
|
|
29 |
|
30 |
Previous releases were based on a SLERP merge of model_stock+della branches focused on reasoning and prose. The prose branch got surprisingly good at reasoning, and the reasoning branch became a strong generalist in its own right. Some of you have already downloaded it as [sometimesanotion/Qwen2.5-14B-Vimarckoso-v3](https://huggingface.co/sometimesanotion/Qwen2.5-14B-Vimarckoso-v3).
|
31 |
|
|
|
|
|
32 |
Lamarck 0.6 hit a whole new of multi-pronged merge strategies:
|
33 |
|
34 |
- **Extracted LoRA adapters from special-purpose merges**
|
@@ -36,7 +38,7 @@ Lamarck 0.6 hit a whole new of multi-pronged merge strategies:
|
|
36 |
- **Highly targeted weight/density gradients for every 2-4 layers**
|
37 |
- **Finalization through SLERP merges recombining the separate branches**
|
38 |
|
39 |
-
This approach selectively merges the strongest aspects of its ancestors. Lamarck v0.6 is my most complex merge to date. The LORA extractions alone pushed my hardware to
|
40 |
|
41 |
```yaml
|
42 |
name: Lamarck-14B-v0.6-rc4
|
@@ -86,4 +88,4 @@ slices:
|
|
86 |
|
87 |
```
|
88 |
|
89 |
-
The strengths Lamarck has combined from its immediate ancestors are in turn derived from select finetunes and merges. Kudoes to @arcee-ai, @CultriX, @sthenno-com, @Krystalan, @underwoods, @VAGOSolutions, and @rombodawg whose models had the most influence
|
|
|
29 |
|
30 |
Previous releases were based on a SLERP merge of model_stock+della branches focused on reasoning and prose. The prose branch got surprisingly good at reasoning, and the reasoning branch became a strong generalist in its own right. Some of you have already downloaded it as [sometimesanotion/Qwen2.5-14B-Vimarckoso-v3](https://huggingface.co/sometimesanotion/Qwen2.5-14B-Vimarckoso-v3).
|
31 |
|
32 |
+
A notable contribution from the middle to upper layers of Lamarck v0.6 comes from [Krystalan/DRT-o1-14B](https://huggingface.co/Krystalan/DRT-o1-14B). It has a fascinating research paper: [DRT-o1: Optimized Deep Reasoning Translation via Long Chain-of-Thought](https://huggingface.co/papers/2412.17498).
|
33 |
+
|
34 |
Lamarck 0.6 hit a whole new of multi-pronged merge strategies:
|
35 |
|
36 |
- **Extracted LoRA adapters from special-purpose merges**
|
|
|
38 |
- **Highly targeted weight/density gradients for every 2-4 layers**
|
39 |
- **Finalization through SLERP merges recombining the separate branches**
|
40 |
|
41 |
+
This approach selectively merges the strongest aspects of its ancestors. Lamarck v0.6 is my most complex merge to date. The LORA extractions alone pushed my hardware enough to be the building's sole source of heat for several winter days! By comparison, the SLERP merge below which finalized it was a simple step.
|
42 |
|
43 |
```yaml
|
44 |
name: Lamarck-14B-v0.6-rc4
|
|
|
88 |
|
89 |
```
|
90 |
|
91 |
+
The strengths Lamarck has combined from its immediate ancestors are in turn derived from select finetunes and merges. Kudoes to @arcee-ai, @CultriX, @sthenno-com, @Krystalan, @underwoods, @VAGOSolutions, and @rombodawg whose models had the most influence. Of this model's immediate ancestors, [Vimarckoso v3](https://huggingface.co/sometimesanotion/Qwen2.5-14B-Vimarckoso-v3) has the model card which documents the other finetunes in its extended lineage.
|