24bean commited on
Commit
79e1d99
·
1 Parent(s): f3e1b83

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +73 -1
README.md CHANGED
@@ -11,4 +11,76 @@ tags:
11
  - kollama
12
  - llama-2-ko
13
  - text-generation-inference
14
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  - kollama
12
  - llama-2-ko
13
  - text-generation-inference
14
+ ---
15
+
16
+
17
+ # Llama 2 ko 7B - GGUF
18
+ - Model creator: [Meta](https://huggingface.co/meta-llama)
19
+ - Original model: [Llama 2 7B](https://huggingface.co/meta-llama/Llama-2-7b-hf)
20
+ - Original Llama-2-Ko model: [Llama 2 ko 7B](https://huggingface.co/beomi/llama-2-ko-7b)
21
+ - Reference: [Llama 2 7B GGUF](https://huggingface.co/TheBloke/Llama-2-7B-GGUF)
22
+
23
+ <!-- description start -->
24
+ ## Download
25
+ ```shell
26
+ pip3 install huggingface-hub>=0.17.1
27
+ ```
28
+
29
+ Then you can download any individual model file to the current directory, at high speed, with a command like this:
30
+
31
+ ```shell
32
+ huggingface-cli download 24bean/Llama-2-7B-ko-GGUF llama-2-ko-7b_q8_0.gguf --local-dir . --local-dir-use-symlinks False
33
+ ```
34
+
35
+ Or you can download llama-2-ko-7b.gguf, non-quantized model by
36
+
37
+ ```shell
38
+ huggingface-cli download 24bean/Llama-2-7B-ko-GGUF llama-2-ko-7b.gguf --local-dir . --local-dir-use-symlinks False
39
+ ```
40
+
41
+ ## Example `llama.cpp` command
42
+
43
+ Make sure you are using `llama.cpp` from commit [d0cee0d36d5be95a0d9088b674dbb27354107221](https://github.com/ggerganov/llama.cpp/commit/d0cee0d36d5be95a0d9088b674dbb27354107221) or later.
44
+
45
+ ```shell
46
+ ./main -ngl 32 -m llama-2-ko-7b_q8_0.gguf --color -c 4096 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "{prompt}"
47
+ ```
48
+
49
+ # How to run from Python code
50
+
51
+ You can use GGUF models from Python using the [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) or [ctransformers](https://github.com/marella/ctransformers) libraries.
52
+
53
+ ## How to load this model from Python using ctransformers
54
+
55
+ ### First install the package
56
+
57
+ ```bash
58
+ # Base ctransformers with no GPU acceleration
59
+ pip install ctransformers>=0.2.24
60
+ # Or with CUDA GPU acceleration
61
+ pip install ctransformers[cuda]>=0.2.24
62
+ # Or with ROCm GPU acceleration
63
+ CT_HIPBLAS=1 pip install ctransformers>=0.2.24 --no-binary ctransformers
64
+ # Or with Metal GPU acceleration for macOS systems
65
+ CT_METAL=1 pip install ctransformers>=0.2.24 --no-binary ctransformers
66
+ ```
67
+
68
+ ### Simple example code to load one of these GGUF models
69
+
70
+ ```python
71
+ from ctransformers import AutoModelForCausalLM
72
+
73
+ # Set gpu_layers to the number of layers to offload to GPU. Set to 0 if no GPU acceleration is available on your system.
74
+ llm = AutoModelForCausalLM.from_pretrained("24bean/Llama-2-ko-7B-GGUF", model_file="llama-2-7b_q8_0.gguf", model_type="llama", gpu_layers=50)
75
+
76
+ print(llm("AI is going to"))
77
+ ```
78
+
79
+ ## How to use with LangChain
80
+
81
+ Here's guides on using llama-cpp-python or ctransformers with LangChain:
82
+
83
+ * [LangChain + llama-cpp-python](https://python.langchain.com/docs/integrations/llms/llamacpp)
84
+ * [LangChain + ctransformers](https://python.langchain.com/docs/integrations/providers/ctransformers)
85
+
86
+ <!-- README_GGUF.md-how-to-run end -->