|
--- |
|
license: llama2 |
|
language: |
|
- ko |
|
pipeline_tag: text-generation |
|
tags: |
|
- ' llama' |
|
- facebook |
|
- ' meta' |
|
- llama-2 |
|
- kollama |
|
- llama-2-ko |
|
- llama-2-ko-chat |
|
- text-generation-inference |
|
--- |
|
|
|
# 💻MAC os Compatible💻 |
|
|
|
# Llama 2 ko 7B - GGUF |
|
- Model creator: [Meta](https://huggingface.co/meta-llama) |
|
- Original model: [Llama 2 7B Chat](https://huggingface.co/meta-llama/Llama-2-7b-chat) |
|
- Reference: [Llama 2 7B GGUF](https://huggingface.co/TheBloke/Llama-2-7B-GGUF) |
|
|
|
<!-- description start --> |
|
## Download |
|
```shell |
|
pip3 install huggingface-hub>=0.17.1 |
|
``` |
|
|
|
Then you can download any individual model file to the current directory, at high speed, with a command like this: |
|
|
|
```shell |
|
huggingface-cli download 24bean/Llama-2-ko-7B-Chat-GGUF llama-2-ko-7b-chat-q8-0.gguf --local-dir . --local-dir-use-symlinks False |
|
``` |
|
|
|
Or you can download llama-2-ko-7b.gguf, non-quantized model by |
|
|
|
```shell |
|
huggingface-cli download 24bean/Llama-2-ko-7B-Chat-GGUF llama-2-ko-7b-chat.gguf --local-dir . --local-dir-use-symlinks False |
|
``` |
|
|
|
## Example `llama.cpp` command |
|
|
|
Make sure you are using `llama.cpp` from commit [d0cee0d36d5be95a0d9088b674dbb27354107221](https://github.com/ggerganov/llama.cpp/commit/d0cee0d36d5be95a0d9088b674dbb27354107221) or later. |
|
|
|
```shell |
|
./main -ngl 32 -m llama-2-ko-7b-chat-q8-0.gguf --color -c 4096 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "{prompt}" |
|
``` |
|
|
|
# How to run from Python code |
|
|
|
You can use GGUF models from Python using the [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) or [ctransformers](https://github.com/marella/ctransformers) libraries. |
|
|
|
## How to load this model from Python using ctransformers |
|
|
|
### First install the package |
|
|
|
```bash |
|
# Base ctransformers with no GPU acceleration |
|
pip install ctransformers>=0.2.24 |
|
# Or with CUDA GPU acceleration |
|
pip install ctransformers[cuda]>=0.2.24 |
|
# Or with ROCm GPU acceleration |
|
CT_HIPBLAS=1 pip install ctransformers>=0.2.24 --no-binary ctransformers |
|
# Or with Metal GPU acceleration for macOS systems |
|
CT_METAL=1 pip install ctransformers>=0.2.24 --no-binary ctransformers |
|
``` |
|
|
|
### Simple example code to load one of these GGUF models |
|
|
|
```python |
|
from ctransformers import AutoModelForCausalLM |
|
|
|
# Set gpu_layers to the number of layers to offload to GPU. Set to 0 if no GPU acceleration is available on your system. |
|
llm = AutoModelForCausalLM.from_pretrained("24bean/Llama-2-ko-7B-Chat-GGUF", model_file="llama-2-7b-chat-q8-0.gguf", model_type="llama", gpu_layers=50) |
|
|
|
print(llm("인공지능은")) |
|
``` |
|
|
|
## How to use with LangChain |
|
|
|
Here's guides on using llama-cpp-python or ctransformers with LangChain: |
|
|
|
* [LangChain + llama-cpp-python](https://python.langchain.com/docs/integrations/llms/llamacpp) |
|
* [LangChain + ctransformers](https://python.langchain.com/docs/integrations/providers/ctransformers) |
|
|
|
<!-- README_GGUF.md-how-to-run end --> |