brucethemoose
commited on
Commit
•
681df06
1
Parent(s):
ffdfe3b
Update README.md
Browse files
README.md
CHANGED
@@ -15,12 +15,10 @@ pipeline_tag: text-generation
|
|
15 |
https://github.com/yule-BUAA/MergeLM
|
16 |
|
17 |
https://github.com/cg123/mergekit/tree/dare'
|
|
|
18 |
|
19 |
-
24GB GPUs can run Yi-34B-200K models at **45K-75K context** with exllamav2. I go into more detail in this [Reddit post](https://old.reddit.com/r/LocalLLaMA/comments/1896igc/how_i_run_34b_models_at_75k_context_on_24gb_fast/), and recommend exl2 quantizations on data similar to the desired task, such as these targeted at fiction:
|
20 |
-
|
21 |
-
[4.0bpw](https://huggingface.co/brucethemoose/CapyTessBorosYi-34B-200K-DARE-Ties-exl2-4bpw-fiction)
|
22 |
|
23 |
-
[3.1bpw](https://huggingface.co/brucethemoose/CapyTessBorosYi-34B-200K-DARE-Ties-exl2-3.1bpw-fiction)
|
24 |
***
|
25 |
|
26 |
Merged with the following config, and the tokenizer from chargoddard's Yi-Llama:
|
|
|
15 |
https://github.com/yule-BUAA/MergeLM
|
16 |
|
17 |
https://github.com/cg123/mergekit/tree/dare'
|
18 |
+
***
|
19 |
|
|
|
|
|
|
|
20 |
|
21 |
+
24GB GPUs can run Yi-34B-200K models at **45K-75K context** with exllamav2. I go into more detail in this [Reddit post](https://old.reddit.com/r/LocalLLaMA/comments/1896igc/how_i_run_34b_models_at_75k_context_on_24gb_fast/), and recommend exl2 quantizations on data similar to the desired task, such as these targeted at fiction: [4.0bpw](https://huggingface.co/brucethemoose/CapyTessBorosYi-34B-200K-DARE-Ties-exl2-4bpw-fiction) [3.1bpw](https://huggingface.co/brucethemoose/CapyTessBorosYi-34B-200K-DARE-Ties-exl2-3.1bpw-fiction)
|
22 |
***
|
23 |
|
24 |
Merged with the following config, and the tokenizer from chargoddard's Yi-Llama:
|