DavidAU
/

D_AU-Orac-13B-Tiefighter-slerp-imat-plus-GGUF

Text Generation

Inference Endpoints

Model card Files Files and versions Community

D_AU-Orac-13B-Tiefighter-slerp-imat-plus-GGUF / README.md

DavidAU's picture

Update README.md

d246b8d verified 9 months ago

|

3.07 kB

	---
	license: mit
	language:
	- en
	---

	Imatrix compressions of FP Merge of "D_AU-Orac-13B-Tiefighter-slerp".

	"Imatrix Plus" is an upgraded form of Imatrix which using full precision for specific parts of the compression.
	As a result all compressions will be slightly larger in size than standard 13B compressions.

	This method results in a higher quality model, especially at lower compressions.
	This method is applied across all compressions from IQ1 to Q8.

	Even IQ1_S - the most compressed verison - works well, however IQ4/Q4 are suggested as minimums for quality.
	Highest quality will be Q6/Q8.

	How big a difference is this merge?

	Orginal Tiefighter IQ1_S (with imatrix enhancements) tested at a perplexity of:
	PPL = 17.2589 +/- 0.12466*

	Tiefighter Orca 2 IQ1_S (with imatrix enhancements) tested at a perplexity of:
	PPL = 12.6985 +/- 0.09106*

	Note that LOWER perplexity is better.

	* Tested using llamacpp, perplexity.exe with wiki.raw.

	In addition the Imatrix file used to "fix" the compressed files post compression resulted in
	over 2 whole points lower perplexity at IQ1_S vs some of the other "Imatrix" files currently in use.

	Orginal Tiefighter IQ1_S (with imatrix enhancements) tested with a different "Imatrix" repair file at a perplexity of:
	PPL = 19.6355 +/- 0.14435

	Likewise the merge itself affected perplexity too.

	This merge was an experiment to test already established Roleplay, Fiction and Story
	generation of "Tiefighter" with a some of "Orca 2"'s qualities.

	Additional merge experiements are in progress.

	For Imatrix plus this was a test of high precision in specific areas of the model leading to a slightly larger compressed file.
	In addition the Imatrix process itself used a larger "calibration" file than standard to further enhance quality.

	The process added appoximately 310 MB to each compressed file.

	A blank or standard Alpaca Template for text generation will work.
	Currently "CHATML" is untested.

	Context length: 4096.

	Please see the orginal model card for specific details of use, additional credits and tips:

	[KoboldAI/LLaMA2-13B-Tiefighter](https://huggingface.co/KoboldAI/LLaMA2-13B-Tiefighter)

	# merge

	This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).

	## Merge Details
	### Merge Method

	This model was merged using the SLERP merge method.

	### Models Merged

	The following models were included in the merge:
	* [microsoft/Orca-2-13b](https://huggingface.co/microsoft/Orca-2-13b)
	* [KoboldAI/LLaMA2-13B-Tiefighter](https://huggingface.co/KoboldAI/LLaMA2-13B-Tiefighter)

	### Configuration

	The following YAML configuration was used to produce this model:

	```yaml
	slices:
	- sources:
	- model: KoboldAI/LLaMA2-13B-Tiefighter
	layer_range: [0, 40]
	- model: microsoft/Orca-2-13b
	layer_range: [0, 40]
	merge_method: slerp
	base_model: microsoft/Orca-2-13b
	parameters:
	t:
	- filter: self_attn
	value: [0, 0.5, 0.3, 0.7, 1]
	- filter: mlp
	value: [1, 0.5, 0.7, 0.3, 0]
	- value: 0.5
	dtype: bfloat16

	```