File size: 4,284 Bytes
416ff8a 5ee1b61 416ff8a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 |
---
license: apache-2.0
library_name: transformers
tags:
- language
- granite-3.1
- llama-cpp
- gguf-my-repo
base_model: ibm-granite/granite-3.1-2b-base
---
# Triangle104/granite-3.1-2b-base-Q4_K_M-GGUF
This model was converted to GGUF format from [`ibm-granite/granite-3.1-2b-base`](https://huggingface.co/ibm-granite/granite-3.1-2b-base) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
Refer to the [original model card](https://huggingface.co/ibm-granite/granite-3.1-2b-base) for more details on the model.
---
Model details:
-
Granite-3.1-2B-Base extends the context length of Granite-3.0-2B-Base
from 4K to 128K using a progressive training strategy by increasing the
supported context length in increments while adjusting RoPE theta until
the model has successfully adapted to desired length of 128K. This
long-context pre-training stage was performed using approximately 500B
tokens.
Developers: Granite Team, IBM
GitHub Repository: ibm-granite/granite-3.1-language-models
Website: Granite Docs
Paper: Granite 3.1 Language Models (coming soon)
Release Date: December 18th, 2024
License: Apache 2.0
Supported Languages:
English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech,
Italian, Korean, Dutch, and Chinese. Users may finetune Granite 3.1
models for languages beyond these 12 languages.
Intended Use:
Prominent use cases of LLMs in text-to-text generation include
summarization, text classification, extraction, question-answering, and
other long-context tasks. All Granite Base models are able to handle
these tasks as they were trained on a large amount of data from various
domains. Moreover, they can serve as baseline to create specialized
models for specific application scenarios.
Generation:
This is a simple example of how to use Granite-3.1-2B-Base model.
Install the following libraries:
pip install torch torchvision torchaudio
pip install accelerate
pip install transformers
Then, copy the code snippet below to run the example.
from transformers import AutoModelForCausalLM, AutoTokenizer
device = "auto"
model_path = "ibm-granite/granite-3.1-2b-base"
tokenizer = AutoTokenizer.from_pretrained(model_path)
# drop device_map if running on CPU
model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device)
model.eval()
# change input text as desired
input_text = "Where is the Thomas J. Watson Research Center located?"
# tokenize the text
input_tokens = tokenizer(input_text, return_tensors="pt").to(device)
# generate output tokens
output = model.generate(**input_tokens,
max_length=4000)
# decode output tokens into text
output = tokenizer.batch_decode(output)
# print output
print(output)
Model Architecture:
Granite-3.1-2B-Base is based on a decoder-only dense transformer
architecture. Core components of this architecture are: GQA and RoPE,
MLP with SwiGLU, RMSNorm, and shared input/output embeddings.
---
## Use with llama.cpp
Install llama.cpp through brew (works on Mac and Linux)
```bash
brew install llama.cpp
```
Invoke the llama.cpp server or the CLI.
### CLI:
```bash
llama-cli --hf-repo Triangle104/granite-3.1-2b-base-Q4_K_M-GGUF --hf-file granite-3.1-2b-base-q4_k_m.gguf -p "The meaning to life and the universe is"
```
### Server:
```bash
llama-server --hf-repo Triangle104/granite-3.1-2b-base-Q4_K_M-GGUF --hf-file granite-3.1-2b-base-q4_k_m.gguf -c 2048
```
Note: You can also use this checkpoint directly through the [usage steps](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#usage) listed in the Llama.cpp repo as well.
Step 1: Clone llama.cpp from GitHub.
```
git clone https://github.com/ggerganov/llama.cpp
```
Step 2: Move into the llama.cpp folder and build it with `LLAMA_CURL=1` flag along with other hardware-specific flags (for ex: LLAMA_CUDA=1 for Nvidia GPUs on Linux).
```
cd llama.cpp && LLAMA_CURL=1 make
```
Step 3: Run inference through the main binary.
```
./llama-cli --hf-repo Triangle104/granite-3.1-2b-base-Q4_K_M-GGUF --hf-file granite-3.1-2b-base-q4_k_m.gguf -p "The meaning to life and the universe is"
```
or
```
./llama-server --hf-repo Triangle104/granite-3.1-2b-base-Q4_K_M-GGUF --hf-file granite-3.1-2b-base-q4_k_m.gguf -c 2048
```
|