dahara1
/

weblab-10b-instruction-sft-GPTQ

Text Generation

text-generation-inference

Model card Files Files and versions Community

dahara1 commited on Aug 23, 2023

Commit

3105ad4

·

1 Parent(s): 36dec03

Update README.md

Files changed (1) hide show

README.md +10 -10

README.md CHANGED Viewed

@@ -7,23 +7,23 @@ language:
 original model [weblab-10b-instruction-sft](https://huggingface.co/matsuo-lab/weblab-10b-instruction-sft) which is a Japanese-centric multilingual GPT-NeoX model of 10 billion parameters.
-This model is A quantized(miniaturized) version of the original model.
 There are currently two well-known quantization methods.
-(1)GPTQ(This model)
-The size is smaller and the execution speed is faster, but the inference performance may be a little worse.
 You need autoGPTQ library to use this model.
-(2)llama.cpp([matsuolab-weblab-10b-instruction-sft-gguf](https://huggingface.co/mmnga/matsuolab-weblab-10b-instruction-sft-gguf)) created by mmnga.
-You can use cpu only machine. but little bit slow especialy long text.
 ### sample code
-At least one GPU is currently required due to a limitation of the Accelerate library.
-So this model cannot be run with the huggingface space free version.
-Try it on [Google Colab Under development](https://github.com/webbigdata-jp/python_sample/blob/main/weblab_10b_instruction_sft_GPTQ_sample.ipynb)
 ```
 pip install auto-gptq

 original model [weblab-10b-instruction-sft](https://huggingface.co/matsuo-lab/weblab-10b-instruction-sft) which is a Japanese-centric multilingual GPT-NeoX model of 10 billion parameters.
+This model is A quantized(miniaturized) version of the original model(21.42GB).
 There are currently two well-known quantization methods.
+(1)GPTQ(This model. 6.3 GB)
+The size is smaller and the execution speed is faster, but the inference performance may be a little worse than original model.
+At least one GPU is currently required due to a limitation of the Accelerate library.
+So this model cannot be run with the huggingface space free version.
 You need autoGPTQ library to use this model.
+(2)gguf([matsuolab-weblab-10b-instruction-sft-gguf](https://huggingface.co/mmnga/matsuolab-weblab-10b-instruction-sft-gguf) 6.03GB) created by mmnga.
+You can use gguf model with llama.cpp at cpu only machine.
+but maybe little bit slower then GPTQ especialy long text.
 ### sample code
+Try it on [Google Colab. Under development](https://github.com/webbigdata-jp/python_sample/blob/main/weblab_10b_instruction_sft_GPTQ_sample.ipynb)
 ```
 pip install auto-gptq