granite-3.1-2b-base-Q6_K-GGUF / README.md

Update README.md

014b209 verified 13 days ago

4.27 kB

	---
	license: apache-2.0
	library_name: transformers
	tags:
	- language
	- granite-3.1
	- llama-cpp
	- gguf-my-repo
	base_model: ibm-granite/granite-3.1-2b-base
	---

	# Triangle104/granite-3.1-2b-base-Q6_K-GGUF
	This model was converted to GGUF format from [`ibm-granite/granite-3.1-2b-base`](https://huggingface.co/ibm-granite/granite-3.1-2b-base) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
	Refer to the [original model card](https://huggingface.co/ibm-granite/granite-3.1-2b-base) for more details on the model.

	---
	Model details:
	-
	Granite-3.1-2B-Base extends the context length of Granite-3.0-2B-Base
	from 4K to 128K using a progressive training strategy by increasing the
	supported context length in increments while adjusting RoPE theta until
	the model has successfully adapted to desired length of 128K. This
	long-context pre-training stage was performed using approximately 500B
	tokens.

	Developers: Granite Team, IBM
	GitHub Repository: ibm-granite/granite-3.1-language-models
	Website: Granite Docs
	Paper: Granite 3.1 Language Models (coming soon)
	Release Date: December 18th, 2024
	License: Apache 2.0


	Supported Languages:
	English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech,
	Italian, Korean, Dutch, and Chinese. Users may finetune Granite 3.1
	models for languages beyond these 12 languages.


	Intended Use:
	Prominent use cases of LLMs in text-to-text generation include
	summarization, text classification, extraction, question-answering, and
	other long-context tasks. All Granite Base models are able to handle
	these tasks as they were trained on a large amount of data from various
	domains. Moreover, they can serve as baseline to create specialized
	models for specific application scenarios.


	Generation:
	This is a simple example of how to use Granite-3.1-2B-Base model.


	Install the following libraries:


	pip install torch torchvision torchaudio
	pip install accelerate
	pip install transformers



	Then, copy the code snippet below to run the example.


	from transformers import AutoModelForCausalLM, AutoTokenizer
	device = "auto"
	model_path = "ibm-granite/granite-3.1-2b-base"
	tokenizer = AutoTokenizer.from_pretrained(model_path)
	# drop device_map if running on CPU
	model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device)
	model.eval()
	# change input text as desired
	input_text = "Where is the Thomas J. Watson Research Center located?"
	# tokenize the text
	input_tokens = tokenizer(input_text, return_tensors="pt").to(device)
	# generate output tokens
	output = model.generate(**input_tokens,
	max_length=4000)
	# decode output tokens into text
	output = tokenizer.batch_decode(output)
	# print output
	print(output)



	Model Architecture:
	Granite-3.1-2B-Base is based on a decoder-only dense transformer
	architecture. Core components of this architecture are: GQA and RoPE,
	MLP with SwiGLU, RMSNorm, and shared input/output embeddings.

	---
	## Use with llama.cpp
	Install llama.cpp through brew (works on Mac and Linux)

	```bash
	brew install llama.cpp

	```
	Invoke the llama.cpp server or the CLI.

	### CLI:
	```bash
	llama-cli --hf-repo Triangle104/granite-3.1-2b-base-Q6_K-GGUF --hf-file granite-3.1-2b-base-q6_k.gguf -p "The meaning to life and the universe is"
	```

	### Server:
	```bash
	llama-server --hf-repo Triangle104/granite-3.1-2b-base-Q6_K-GGUF --hf-file granite-3.1-2b-base-q6_k.gguf -c 2048
	```

	Note: You can also use this checkpoint directly through the [usage steps](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#usage) listed in the Llama.cpp repo as well.

	Step 1: Clone llama.cpp from GitHub.
	```
	git clone https://github.com/ggerganov/llama.cpp
	```

	Step 2: Move into the llama.cpp folder and build it with `LLAMA_CURL=1` flag along with other hardware-specific flags (for ex: LLAMA_CUDA=1 for Nvidia GPUs on Linux).
	```
	cd llama.cpp && LLAMA_CURL=1 make
	```

	Step 3: Run inference through the main binary.
	```
	./llama-cli --hf-repo Triangle104/granite-3.1-2b-base-Q6_K-GGUF --hf-file granite-3.1-2b-base-q6_k.gguf -p "The meaning to life and the universe is"
	```
	or
	```
	./llama-server --hf-repo Triangle104/granite-3.1-2b-base-Q6_K-GGUF --hf-file granite-3.1-2b-base-q6_k.gguf -c 2048
	```