brucethemoose commited on
Commit
ffdfe3b
1 Parent(s): 034e3bb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -1
README.md CHANGED
@@ -16,8 +16,11 @@ https://github.com/yule-BUAA/MergeLM
16
 
17
  https://github.com/cg123/mergekit/tree/dare'
18
 
19
- 24GB GPUs can run Yi-34B-200K models at **45K-75K context** with exllamav2. I go into more detail in this [Reddit post](https://old.reddit.com/r/LocalLLaMA/comments/1896igc/how_i_run_34b_models_at_75k_context_on_24gb_fast/), and recommend exl2 quantizations on data similar to the desired task.
20
 
 
 
 
21
  ***
22
 
23
  Merged with the following config, and the tokenizer from chargoddard's Yi-Llama:
 
16
 
17
  https://github.com/cg123/mergekit/tree/dare'
18
 
19
+ 24GB GPUs can run Yi-34B-200K models at **45K-75K context** with exllamav2. I go into more detail in this [Reddit post](https://old.reddit.com/r/LocalLLaMA/comments/1896igc/how_i_run_34b_models_at_75k_context_on_24gb_fast/), and recommend exl2 quantizations on data similar to the desired task, such as these targeted at fiction:
20
 
21
+ [4.0bpw](https://huggingface.co/brucethemoose/CapyTessBorosYi-34B-200K-DARE-Ties-exl2-4bpw-fiction)
22
+
23
+ [3.1bpw](https://huggingface.co/brucethemoose/CapyTessBorosYi-34B-200K-DARE-Ties-exl2-3.1bpw-fiction)
24
  ***
25
 
26
  Merged with the following config, and the tokenizer from chargoddard's Yi-Llama: