JosephusCheung
commited on
Commit
•
9d17d94
1
Parent(s):
6e8fcba
Update README.md
Browse files
README.md
CHANGED
@@ -8,11 +8,11 @@ tags:
|
|
8 |
---
|
9 |
# A Chat Model, Testing only, no performance guaranteeeee...
|
10 |
|
11 |
-
|
12 |
|
13 |
-
|
14 |
|
15 |
-
|
16 |
|
17 |
*Do not use wikitext for recalibration.*
|
18 |
|
|
|
8 |
---
|
9 |
# A Chat Model, Testing only, no performance guaranteeeee...
|
10 |
|
11 |
+
~There is something wrong with llama.cpp GGUF format, need some time to fix that. [https://github.com/ggerganov/llama.cpp/pull/4283](https://github.com/ggerganov/llama.cpp/pull/4283)~
|
12 |
|
13 |
+
Please use the latest version of llama.cpp with GGUF Quants: [CausalLM/72B-preview-GGUF](https://huggingface.co/CausalLM/72B-preview-GGUF)
|
14 |
|
15 |
+
Use the transformers library that does not require remote/external code to load the model, AutoModelForCausalLM and AutoTokenizer (or manually specify LlamaForCausalLM to load LM, GPT2Tokenizer to load Tokenizer), and model quantization should be fully compatible with GGUF (llama.cpp), GPTQ, and AWQ.
|
16 |
|
17 |
*Do not use wikitext for recalibration.*
|
18 |
|