TheBloke
/

Yi-34B-200K-DARE-megamerge-v8-GGUF

Transformers

GGUF

English

mergekit

Merge

Model card Files Files and versions Community

TheBloke commited on Jan 15

Commit

cb5eec6

•

1 Parent(s): 55bdfaa

Upload README.md

Browse files

Files changed (1) hide show

README.md +6 -1

README.md CHANGED Viewed

@@ -122,7 +122,10 @@ Refer to the Provided Files table below to see what files use which methods, and
 | Name | Quant method | Bits | Size | Max RAM required | Use case |
 | ---- | ---- | ---- | ---- | ---- | ----- |
-| [yi-34b-200k-dare-megamerge-v8.Q2_K.gguf](https://huggingface.co/TheBloke/Yi-34B-200K-DARE-megamerge-v8-GGUF/blob/main/yi-34b-200k-dare-megamerge-v8.Q2_K.gguf) | Q2_K | 2 | 12.77 GB| 15.27 GB | smallest, significant quality loss - not recommended for most purposes |
 | [yi-34b-200k-dare-megamerge-v8.Q3_K_S.gguf](https://huggingface.co/TheBloke/Yi-34B-200K-DARE-megamerge-v8-GGUF/blob/main/yi-34b-200k-dare-megamerge-v8.Q3_K_S.gguf) | Q3_K_S | 3 | 14.96 GB| 17.46 GB | very small, high quality loss |
 | [yi-34b-200k-dare-megamerge-v8.Q3_K_M.gguf](https://huggingface.co/TheBloke/Yi-34B-200K-DARE-megamerge-v8-GGUF/blob/main/yi-34b-200k-dare-megamerge-v8.Q3_K_M.gguf) | Q3_K_M | 3 | 16.65 GB| 19.15 GB | very small, high quality loss |
 | [yi-34b-200k-dare-megamerge-v8.Q3_K_L.gguf](https://huggingface.co/TheBloke/Yi-34B-200K-DARE-megamerge-v8-GGUF/blob/main/yi-34b-200k-dare-megamerge-v8.Q3_K_L.gguf) | Q3_K_L | 3 | 18.14 GB| 20.64 GB | small, substantial quality loss |
@@ -353,6 +356,8 @@ Being a Yi model, run a lower temperature with 0.05 or higher MinP, a little rep
 I recommend exl2 quantizations profiled on data similar to the desired task. It is especially sensitive to the quantization data at low bpw. I've upload my own fiction-oriented quantizations here: https://huggingface.co/collections/brucethemoose/most-recent-merge-65742644ca03b6c514afa204
 To load/train this in full-context backends like transformers, you *must* change `max_position_embeddings` in config.json to a lower value than 200,000, otherwise you will OOM! I do not recommend running high context without context-efficient backends like exllamav2, litellm or unsloth.

 | Name | Quant method | Bits | Size | Max RAM required | Use case |
 | ---- | ---- | ---- | ---- | ---- | ----- |
+| [yi-34b-200k-dare-megamerge-v8.IQ2_XXS.gguf](https://huggingface.co/TheBloke/Yi-34B-200K-DARE-megamerge-v8-GGUF/blob/main/yi-34b-200k-dare-megamerge-v8.IQ2_XXS.gguf) | IQ2_XXS | 2 | 9.31 GB| 11.81 GB | smallest size. 2.06 bpw. New IQuant method, Jan 2024 |
+| [yi-34b-200k-dare-megamerge-v8.IQ2_XS.gguf](https://huggingface.co/TheBloke/Yi-34B-200K-DARE-megamerge-v8-GGUF/blob/main/yi-34b-200k-dare-megamerge-v8.IQ2_XS.gguf) | IQ2_XS | 2 | 10.31 GB| 12.81 GB | second smallest size. 2.31 bpw quant. New IQuant method, Jan 2024 |
+| [yi-34b-200k-dare-megamerge-v8.Q2_K_S.gguf](https://huggingface.co/TheBloke/Yi-34B-200K-DARE-megamerge-v8-GGUF/blob/main/yi-34b-200k-dare-megamerge-v8.Q2_K_S.gguf) | Q2_K_S | 2 | 11.76 GB| 14.26 GB | significant quality loss - not recommended for most purposes. New method, Jan 2024 |
+| [yi-34b-200k-dare-megamerge-v8.Q2_K.gguf](https://huggingface.co/TheBloke/Yi-34B-200K-DARE-megamerge-v8-GGUF/blob/main/yi-34b-200k-dare-megamerge-v8.Q2_K.gguf) | Q2_K | 2 | 12.77 GB| 15.27 GB | significant quality loss - not recommended for most purposes |
 | [yi-34b-200k-dare-megamerge-v8.Q3_K_S.gguf](https://huggingface.co/TheBloke/Yi-34B-200K-DARE-megamerge-v8-GGUF/blob/main/yi-34b-200k-dare-megamerge-v8.Q3_K_S.gguf) | Q3_K_S | 3 | 14.96 GB| 17.46 GB | very small, high quality loss |
 | [yi-34b-200k-dare-megamerge-v8.Q3_K_M.gguf](https://huggingface.co/TheBloke/Yi-34B-200K-DARE-megamerge-v8-GGUF/blob/main/yi-34b-200k-dare-megamerge-v8.Q3_K_M.gguf) | Q3_K_M | 3 | 16.65 GB| 19.15 GB | very small, high quality loss |
 | [yi-34b-200k-dare-megamerge-v8.Q3_K_L.gguf](https://huggingface.co/TheBloke/Yi-34B-200K-DARE-megamerge-v8-GGUF/blob/main/yi-34b-200k-dare-megamerge-v8.Q3_K_L.gguf) | Q3_K_L | 3 | 18.14 GB| 20.64 GB | small, substantial quality loss |
 I recommend exl2 quantizations profiled on data similar to the desired task. It is especially sensitive to the quantization data at low bpw. I've upload my own fiction-oriented quantizations here: https://huggingface.co/collections/brucethemoose/most-recent-merge-65742644ca03b6c514afa204
+Lonestriker has also uploaded more general purpose quantizations here: https://huggingface.co/models?sort=trending&search=LoneStriker+Yi-34B-200K-DARE-megamerge-v8
 To load/train this in full-context backends like transformers, you *must* change `max_position_embeddings` in config.json to a lower value than 200,000, otherwise you will OOM! I do not recommend running high context without context-efficient backends like exllamav2, litellm or unsloth.