Lamarck-14B-v0.6 / README.md

Update README.md

4815522 verified 5 days ago

5.56 kB

	---
	language:
	- en
	license: apache-2.0
	library_name: transformers
	tags:
	- mergekit
	- merge
	base_model:
	- sometimesanotion/Qwen2.5-14B-Vimarckoso-v3
	- sometimesanotion/Lamarck-14B-v0.3
	- sometimesanotion/Qwenvergence-14B-v3-Prose
	- Krystalan/DRT-o1-14B
	- underwoods/medius-erebus-magnum-14b
	- sometimesanotion/Abliterate-Qwenvergence
	- huihui-ai/Qwen2.5-14B-Instruct-abliterated-v2
	metrics:
	- accuracy
	pipeline_tag: text-generation
	---
	![Lamarck.webp](https://huggingface.co/sometimesanotion/Lamarck-14B-v0.6/resolve/main/Lamarck.webp)
	---

	Lamarck 14B v0.6: A generalist merge focused on multi-step reasoning, prose, multi-language ability, and code. It is based on components that have punched above their weight in the 14 billion parameter class.

	The tempo of Lamarck releases slowed because improving IFEVAL while maintaining other scores is no small task. Previous releases were based on a SLERP merge of model_stock->della branches focused on reasoning and prose. The prose branch got surprisingly good at reasoning, and the reasoning branch became a strong generalist in its own right. Some of you have already downloaded it as [sometimesanotion/Qwen2.5-14B-Vimarckoso-v3](https://huggingface.co/sometimesanotion/Qwen2.5-14B-Vimarckoso-v3).

	Lamarck 0.6 aims to build upon Vimarckoso v3's all-around strength with improvements to prose and translation quality, and strong reasoning for its class. Updates to come as leaderboards become available to evaluate it in-depth. Even now, initial testing is showing solid translation, problem-solving, and prose capability.

	## Merge Details

	This model was made in two branches: a della_linear merge, and a sequence of model_stock and then breadcrumbs SLERP-merged below.

	### Models Merged

	Top influences: The model_stock, breadcrumbs, and della_linear all use the following models:

	- [sometimesanotion/Qwen2.5-14B-Vimarckoso-v3](https://huggingface.co/sometimesanotion/Qwen2.5-14B-Vimarckoso-v3) - As of this writing, Vimarckoso v3 has the #1 average score on [open-llm-leaderboard/open_llm_leaderboard](https://shorturl.at/m225j) for any model under 32 billion parameters. This appears to be because of synergy between its component models.
	- [sometimesanotion/Lamarck-14B-v0.3](https://huggingface.co/sometimesanotion/Lamarck-14B-v0.3) - With heavy influence from [VAGOsolutions/SauerkrautLM-v2-14b-DPO](https://huggingface.co/VAGOsolutions/SauerkrautLM-v2-14b-DPO), this is a leader in technical answers.
	- [sometimesanotion/Qwenvergence-14B-v3-Prose](https://huggingface.co/sometimesanotion/Qwenvergence-14B-v3-Prose) - a model_stock merge of multiple prose-oriented models which posts surprisingly high MATH, GPQA, and MUSR scores, with contributions from [EVA-UNIT-01/EVA-Qwen2.5-14B-v0.2](https://huggingface.co/EVA-UNIT1/EVA-Qwen2.5-14B-v0.2) and [sthenno-com/miscii-14b-1028](https://huggingface.co/sthenno-com/miscii-14b-1028) apparent.
	- [Krystalan/DRT-o1-14B](https://huggingface.co/Krystalan/DRT-o1-14B) - A particularly interesting model which applies extra reasoning to language translation. Check out their fascinating research paper at [arxiv.org/abs/2412.17498](https://arxiv.org/abs/2412.17498).
	- [underwoods/medius-erebus-magnum-14b](https://huggingface.co/underwoods/medius-erebus-magnum-14b) - The leading contributor to prose quality, as it's finetuned on datasets behind the well-recognized Magnum series.
	- [sometimesanotion/Abliterate-Qwenvergence](https://huggingface.co/sometimesanotion/Abliterate-Qwenvergence) - A custom version of [huihui-ai/Qwen2.5-14B-Instruct-abliterated-v2](https://huggingface.co/huihui-ai/Qwen2.5-14B-Instruct-abliterated-v2)

	### Configuration

	This model was made in two branches: a della_linear merge, and a sequence of model_stock and then breadcrumbs+LoRA. They were finalized with the SLERP-merge below.

	```yaml
	name: Lamarck-14B-v0.6-rc4
	merge_method: slerp
	base_model: sometimesanotion/lamarck-14b-converge-della-linear
	tokenizer_source: base
	dtype: float32
	out_dtype: bfloat16
	parameters:
	int8_mask: true
	normalize: true
	rescale: false
	parameters:
	t:
	- value: 0.30
	slices:
	- sources:
	- model: sometimesanotion/lamarck-14b-converge-della-linear
	layer_range: [ 0, 8 ]
	- model: sometimesanotion/lamarck-14b-converge-breadcrumbs
	layer_range: [ 0, 8 ]
	- sources:
	- model: sometimesanotion/lamarck-14b-converge-della-linear
	layer_range: [ 8, 16 ]
	- model: sometimesanotion/lamarck-14b-converge-breadcrumbs
	layer_range: [ 8, 16 ]
	- sources:
	- model: sometimesanotion/lamarck-14b-converge-della-linear
	layer_range: [ 16, 24 ]
	- model: sometimesanotion/lamarck-14b-converge-breadcrumbs
	layer_range: [ 16, 24 ]
	- sources:
	- model: sometimesanotion/lamarck-14b-converge-della-linear
	layer_range: [ 24, 32 ]
	- model: sometimesanotion/lamarck-14b-converge-breadcrumbs
	layer_range: [ 24, 32 ]
	- sources:
	- model: sometimesanotion/lamarck-14b-converge-della-linear
	layer_range: [ 32, 40 ]
	- model: sometimesanotion/lamarck-14b-converge-breadcrumbs
	layer_range: [ 32, 40 ]
	- sources:
	- model: sometimesanotion/lamarck-14b-converge-della-linear
	layer_range: [ 40, 48 ]
	- model: sometimesanotion/lamarck-14b-converge-breadcrumbs
	layer_range: [ 40, 48 ]

	```