Lamarck-14B-v0.6 / README.md
sometimesanotion's picture
Update README.md
4815522 verified
|
raw
history blame
5.56 kB
metadata
language:
  - en
license: apache-2.0
library_name: transformers
tags:
  - mergekit
  - merge
base_model:
  - sometimesanotion/Qwen2.5-14B-Vimarckoso-v3
  - sometimesanotion/Lamarck-14B-v0.3
  - sometimesanotion/Qwenvergence-14B-v3-Prose
  - Krystalan/DRT-o1-14B
  - underwoods/medius-erebus-magnum-14b
  - sometimesanotion/Abliterate-Qwenvergence
  - huihui-ai/Qwen2.5-14B-Instruct-abliterated-v2
metrics:
  - accuracy
pipeline_tag: text-generation

Lamarck.webp

Lamarck 14B v0.6: A generalist merge focused on multi-step reasoning, prose, multi-language ability, and code. It is based on components that have punched above their weight in the 14 billion parameter class.

The tempo of Lamarck releases slowed because improving IFEVAL while maintaining other scores is no small task. Previous releases were based on a SLERP merge of model_stock->della branches focused on reasoning and prose. The prose branch got surprisingly good at reasoning, and the reasoning branch became a strong generalist in its own right. Some of you have already downloaded it as sometimesanotion/Qwen2.5-14B-Vimarckoso-v3.

Lamarck 0.6 aims to build upon Vimarckoso v3's all-around strength with improvements to prose and translation quality, and strong reasoning for its class. Updates to come as leaderboards become available to evaluate it in-depth. Even now, initial testing is showing solid translation, problem-solving, and prose capability.

Merge Details

This model was made in two branches: a della_linear merge, and a sequence of model_stock and then breadcrumbs SLERP-merged below.

Models Merged

Top influences: The model_stock, breadcrumbs, and della_linear all use the following models:

Configuration

This model was made in two branches: a della_linear merge, and a sequence of model_stock and then breadcrumbs+LoRA. They were finalized with the SLERP-merge below.

name:                Lamarck-14B-v0.6-rc4
merge_method:        slerp
base_model:          sometimesanotion/lamarck-14b-converge-della-linear
tokenizer_source:    base
dtype:               float32
out_dtype:           bfloat16
parameters:         
  int8_mask:         true
  normalize:         true
  rescale:           false
parameters:
  t:
    - value:         0.30
slices:
  - sources:
      - model:       sometimesanotion/lamarck-14b-converge-della-linear
        layer_range: [ 0, 8 ]
      - model:       sometimesanotion/lamarck-14b-converge-breadcrumbs
        layer_range: [ 0, 8 ]
  - sources:
      - model:       sometimesanotion/lamarck-14b-converge-della-linear
        layer_range: [ 8, 16 ]
      - model:       sometimesanotion/lamarck-14b-converge-breadcrumbs
        layer_range: [ 8, 16 ]
  - sources:
      - model:       sometimesanotion/lamarck-14b-converge-della-linear
        layer_range: [ 16, 24 ]
      - model:       sometimesanotion/lamarck-14b-converge-breadcrumbs
        layer_range: [ 16, 24 ]
  - sources:
      - model:       sometimesanotion/lamarck-14b-converge-della-linear
        layer_range: [ 24, 32 ]
      - model:       sometimesanotion/lamarck-14b-converge-breadcrumbs
        layer_range: [ 24, 32 ]
  - sources:
      - model:       sometimesanotion/lamarck-14b-converge-della-linear
        layer_range: [ 32, 40 ]
      - model:       sometimesanotion/lamarck-14b-converge-breadcrumbs
        layer_range: [ 32, 40 ]
  - sources:
      - model:       sometimesanotion/lamarck-14b-converge-della-linear
        layer_range: [ 40, 48 ]
      - model:       sometimesanotion/lamarck-14b-converge-breadcrumbs
        layer_range: [ 40, 48 ]