dranger003
/

miquliz-120b-v2.0-iMat.GGUF

Inference Endpoints

Model card Files Files and versions Community

miquliz-120b-v2.0-iMat.GGUF / README.md

dranger003's picture

Update README.md

2095f4e verified 12 months ago

|

history blame contribute delete

445 Bytes

	---
	license: cc-by-nc-2.0
	---
	GGUF importance matrix (imatrix) quants for https://huggingface.co/wolfram/miquliz-120b-v2.0
	The importance matrix was trained for 100K tokens (200 batches of 512 tokens) using wiki.train.raw.

	Using IQ2_XXS it seems to fit 100/141 layers using 2K context on a 24GB card.

	\| Layers \| Context \| Template \|
	\| --- \| --- \| --- \|
	\| <pre>140</pre> \| <pre>32768</pre> \| <pre>[INST] {prompt} [/INST]<br>{response}</pre> \|