Update README.md
Browse files
README.md
CHANGED
@@ -7,23 +7,23 @@ language:
|
|
7 |
|
8 |
original model [weblab-10b-instruction-sft](https://huggingface.co/matsuo-lab/weblab-10b-instruction-sft) which is a Japanese-centric multilingual GPT-NeoX model of 10 billion parameters.
|
9 |
|
10 |
-
This model is A quantized(miniaturized) version of the original model.
|
11 |
|
12 |
There are currently two well-known quantization methods.
|
13 |
-
(1)GPTQ(This model)
|
14 |
-
The size is smaller and the execution speed is faster, but the inference performance may be a little worse.
|
|
|
|
|
15 |
You need autoGPTQ library to use this model.
|
16 |
|
17 |
-
(2)
|
18 |
-
You can use
|
|
|
19 |
|
20 |
|
21 |
### sample code
|
22 |
-
|
23 |
-
|
24 |
-
Try it on [Google Colab Under development](https://github.com/webbigdata-jp/python_sample/blob/main/weblab_10b_instruction_sft_GPTQ_sample.ipynb)
|
25 |
-
|
26 |
-
|
27 |
|
28 |
```
|
29 |
pip install auto-gptq
|
|
|
7 |
|
8 |
original model [weblab-10b-instruction-sft](https://huggingface.co/matsuo-lab/weblab-10b-instruction-sft) which is a Japanese-centric multilingual GPT-NeoX model of 10 billion parameters.
|
9 |
|
10 |
+
This model is A quantized(miniaturized) version of the original model(21.42GB).
|
11 |
|
12 |
There are currently two well-known quantization methods.
|
13 |
+
(1)GPTQ(This model. 6.3 GB)
|
14 |
+
The size is smaller and the execution speed is faster, but the inference performance may be a little worse than original model.
|
15 |
+
At least one GPU is currently required due to a limitation of the Accelerate library.
|
16 |
+
So this model cannot be run with the huggingface space free version.
|
17 |
You need autoGPTQ library to use this model.
|
18 |
|
19 |
+
(2)gguf([matsuolab-weblab-10b-instruction-sft-gguf](https://huggingface.co/mmnga/matsuolab-weblab-10b-instruction-sft-gguf) 6.03GB) created by mmnga.
|
20 |
+
You can use gguf model with llama.cpp at cpu only machine.
|
21 |
+
but maybe little bit slower then GPTQ especialy long text.
|
22 |
|
23 |
|
24 |
### sample code
|
25 |
+
|
26 |
+
Try it on [Google Colab. Under development](https://github.com/webbigdata-jp/python_sample/blob/main/weblab_10b_instruction_sft_GPTQ_sample.ipynb)
|
|
|
|
|
|
|
27 |
|
28 |
```
|
29 |
pip install auto-gptq
|