24bean
/

Llama-2-ko-7B-Chat-GGUF

Text Generation

llama-2-ko-chat

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Llama-2-ko-7B-Chat-GGUF / README.md

24bean's picture

Update README.md

625b847 verified 8 months ago

|

2.83 kB

	---
	license: llama2
	language:
	- ko
	pipeline_tag: text-generation
	tags:
	- ' llama'
	- facebook
	- ' meta'
	- llama-2
	- kollama
	- llama-2-ko
	- llama-2-ko-chat
	- text-generation-inference
	---

	# 💻MAC os Compatible💻

	# Llama 2 ko 7B - GGUF
	- Model creator: [Meta](https://huggingface.co/meta-llama)
	- Original model: [Llama 2 7B Chat](https://huggingface.co/meta-llama/Llama-2-7b-chat)
	- Reference: [Llama 2 7B GGUF](https://huggingface.co/TheBloke/Llama-2-7B-GGUF)

	<!-- description start -->
	## Download
	```shell
	pip3 install huggingface-hub>=0.17.1
	```

	Then you can download any individual model file to the current directory, at high speed, with a command like this:

	```shell
	huggingface-cli download 24bean/Llama-2-ko-7B-Chat-GGUF llama-2-ko-7b-chat-q8-0.gguf --local-dir . --local-dir-use-symlinks False
	```

	Or you can download llama-2-ko-7b.gguf, non-quantized model by

	```shell
	huggingface-cli download 24bean/Llama-2-ko-7B-Chat-GGUF llama-2-ko-7b-chat.gguf --local-dir . --local-dir-use-symlinks False
	```

	## Example `llama.cpp` command

	Make sure you are using `llama.cpp` from commit [d0cee0d36d5be95a0d9088b674dbb27354107221](https://github.com/ggerganov/llama.cpp/commit/d0cee0d36d5be95a0d9088b674dbb27354107221) or later.

	```shell
	./main -ngl 32 -m llama-2-ko-7b-chat-q8-0.gguf --color -c 4096 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "{prompt}"
	```

	# How to run from Python code

	You can use GGUF models from Python using the [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) or [ctransformers](https://github.com/marella/ctransformers) libraries.

	## How to load this model from Python using ctransformers

	### First install the package

	```bash
	# Base ctransformers with no GPU acceleration
	pip install ctransformers>=0.2.24
	# Or with CUDA GPU acceleration
	pip install ctransformers[cuda]>=0.2.24
	# Or with ROCm GPU acceleration
	CT_HIPBLAS=1 pip install ctransformers>=0.2.24 --no-binary ctransformers
	# Or with Metal GPU acceleration for macOS systems
	CT_METAL=1 pip install ctransformers>=0.2.24 --no-binary ctransformers
	```

	### Simple example code to load one of these GGUF models

	```python
	from ctransformers import AutoModelForCausalLM

	# Set gpu_layers to the number of layers to offload to GPU. Set to 0 if no GPU acceleration is available on your system.
	llm = AutoModelForCausalLM.from_pretrained("24bean/Llama-2-ko-7B-Chat-GGUF", model_file="llama-2-7b-chat-q8-0.gguf", model_type="llama", gpu_layers=50)

	print(llm("인공지능은"))
	```

	## How to use with LangChain

	Here's guides on using llama-cpp-python or ctransformers with LangChain:

	* [LangChain + llama-cpp-python](https://python.langchain.com/docs/integrations/llms/llamacpp)
	* [LangChain + ctransformers](https://python.langchain.com/docs/integrations/providers/ctransformers)

	<!-- README_GGUF.md-how-to-run end -->