brucethemoose
/

CapyTessBorosYi-34B-200K-DARE-Ties

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

brucethemoose commited on Dec 6, 2023

Commit

681df06

•

1 Parent(s): ffdfe3b

Update README.md

Files changed (1) hide show

README.md +2 -4

README.md CHANGED Viewed

@@ -15,12 +15,10 @@ pipeline_tag: text-generation
 https://github.com/yule-BUAA/MergeLM
 https://github.com/cg123/mergekit/tree/dare'
-24GB GPUs can run Yi-34B-200K models at **45K-75K context** with exllamav2. I go into more detail in this [Reddit post](https://old.reddit.com/r/LocalLLaMA/comments/1896igc/how_i_run_34b_models_at_75k_context_on_24gb_fast/), and recommend exl2 quantizations on data similar to the desired task, such as these targeted at fiction:
-[4.0bpw](https://huggingface.co/brucethemoose/CapyTessBorosYi-34B-200K-DARE-Ties-exl2-4bpw-fiction)
-[3.1bpw](https://huggingface.co/brucethemoose/CapyTessBorosYi-34B-200K-DARE-Ties-exl2-3.1bpw-fiction)
 ***
 Merged with the following config, and the tokenizer from chargoddard's Yi-Llama:

 https://github.com/yule-BUAA/MergeLM
 https://github.com/cg123/mergekit/tree/dare'
+***
+24GB GPUs can run Yi-34B-200K models at **45K-75K context** with exllamav2. I go into more detail in this [Reddit post](https://old.reddit.com/r/LocalLLaMA/comments/1896igc/how_i_run_34b_models_at_75k_context_on_24gb_fast/), and recommend exl2 quantizations on data similar to the desired task, such as these targeted at fiction: [4.0bpw](https://huggingface.co/brucethemoose/CapyTessBorosYi-34B-200K-DARE-Ties-exl2-4bpw-fiction) [3.1bpw](https://huggingface.co/brucethemoose/CapyTessBorosYi-34B-200K-DARE-Ties-exl2-3.1bpw-fiction)
 ***
 Merged with the following config, and the tokenizer from chargoddard's Yi-Llama: