sometimesanotion
/

Lamarck-14B-v0.6

@@ -29,7 +29,7 @@ Lamarck 14B v0.6:  A generalist merge focused on multi-step reasoning, prose, mu
 Previous releases were based on a SLERP merge of model_stock+della branches focused on reasoning and prose.  The prose branch got surprisingly good at reasoning, and the reasoning branch became a strong generalist in its own right.  Some of you have already downloaded it as [sometimesanotion/Qwen2.5-14B-Vimarckoso-v3](https://huggingface.co/sometimesanotion/Qwen2.5-14B-Vimarckoso-v3).
-A notable contribution to the middle to upper layers of Lamarck v0.6 comes from [Krystalan/DRT-o1-14B](https://huggingface.co/Krystalan/DRT-o1-14B).  It has a fascinating research paper: [DRT-o1: Optimized Deep Reasoning Translation via Long Chain-of-Thought](https://huggingface.co/papers/2412.17498).  It is only a minor contribution, as I have not resolved IFEval issues with larger merges.
 Lamarck 0.6 hit a whole new level of toolchain-automated complexity with its multi-pronged merge strategies:
@@ -41,6 +41,56 @@ Lamarck 0.6 hit a whole new level of toolchain-automated complexity with its mul
 This approach selectively merges the strongest aspects of its ancestors.  Lamarck v0.6 is my most complex merge to date.  The LORA extractions alone pushed my hardware enough to be the building's sole source of heat for several winter days!  By comparison, the SLERP merge below which finalized it was a simple step.
 ```yaml
 name:                Lamarck-14B-v0.6-rc4
 merge_method:        slerp
 base_model:          sometimesanotion/lamarck-14b-converge-della-linear
@@ -54,6 +104,7 @@ parameters:
 parameters:
   t:
     - value:         0.30
 slices:
   - sources:
       - model:       sometimesanotion/lamarck-14b-converge-della-linear

 Previous releases were based on a SLERP merge of model_stock+della branches focused on reasoning and prose.  The prose branch got surprisingly good at reasoning, and the reasoning branch became a strong generalist in its own right.  Some of you have already downloaded it as [sometimesanotion/Qwen2.5-14B-Vimarckoso-v3](https://huggingface.co/sometimesanotion/Qwen2.5-14B-Vimarckoso-v3).
+A notable contribution to the middle to upper layers of Lamarck v0.6 comes from [Krystalan/DRT-o1-14B](https://huggingface.co/Krystalan/DRT-o1-14B).  It has a fascinating research paper: [DRT-o1: Optimized Deep Reasoning Translation via Long Chain-of-Thought](https://huggingface.co/papers/2412.17498).  It is only a minor contribution, as I have not resolved IFEval issues with larger merges.  Rigorously tested CoT has not yet arrived in Lamarck.
 Lamarck 0.6 hit a whole new level of toolchain-automated complexity with its multi-pronged merge strategies:
 This approach selectively merges the strongest aspects of its ancestors.  Lamarck v0.6 is my most complex merge to date.  The LORA extractions alone pushed my hardware enough to be the building's sole source of heat for several winter days!  By comparison, the SLERP merge below which finalized it was a simple step.
 ```yaml
+---
+name:                lamarck-14b-v0.6-005-model_stock
+merge_method:        model_stock
+base_model:          sometimesanotion/Qwenvergence-14B-Base-v2
+tokenizer_source:    sometimesanotion/Abliterate-Qwenvergence
+dtype:               float32
+out_dtype:           bfloat16
+parameters:
+  int8_mask:         true
+  normalize:         true
+  rescale:           false
+models:
+  - model:           arcee-ai/Virtuoso-Small-qv64
+  - model:           Krystalan/DRT-o1-14B-qv128
+  - model:           sometimesanotion/Qwen2.5-14B-Vimarckoso-v3-qv64
+  - model:           sometimesanotion/Qwenvergence-14B-v3-Prose-qv256
+  - model:           sometimesanotion/Abliterate-Qwenvergence
+---
+name:                lamarck-14b-converge-breadcrumbs
+merge_method:        breadcrumbs
+base_model:          sometimesanotion/lamarck-14b-v0.6-005-model_stock
+tokenizer_source:    base
+dtype:               bfloat16
+out_dtype:           bfloat16
+parameters:
+  int8_mask:         true
+  normalize:         true
+  rescale:           false
+  density:           0.95
+  weight:            1.00
+  gamma:             0.018
+# Here there be dragons!
+---
+name:                lamarck-14b-converge-della-linear
+merge_method:        della_linear
+base_model:          sometimesanotion/Qwen2.5-14B-Vimarckoso-v3
+tokenizer_source:    base
+dtype:               float32
+out_dtype:           bfloat16
+parameters:
+  int8_mask:         true
+  normalize:         true
+  rescale:           false
+  density:           0.95
+  weight:            1.00
+  epsilon:           0.018
+  lambda:            1.20
+  smoothing_factor:  0.07
+# Yep, dragons.
+---
 name:                Lamarck-14B-v0.6-rc4
 merge_method:        slerp
 base_model:          sometimesanotion/lamarck-14b-converge-della-linear
 parameters:
   t:
     - value:         0.30
+# Not so dragon-ish.
 slices:
   - sources:
       - model:       sometimesanotion/lamarck-14b-converge-della-linear