File size: 4,284 Bytes
416ff8a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5ee1b61
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
416ff8a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
---
license: apache-2.0
library_name: transformers
tags:
- language
- granite-3.1
- llama-cpp
- gguf-my-repo
base_model: ibm-granite/granite-3.1-2b-base
---

# Triangle104/granite-3.1-2b-base-Q4_K_M-GGUF
This model was converted to GGUF format from [`ibm-granite/granite-3.1-2b-base`](https://huggingface.co/ibm-granite/granite-3.1-2b-base) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
Refer to the [original model card](https://huggingface.co/ibm-granite/granite-3.1-2b-base) for more details on the model.

---
Model details:
-
Granite-3.1-2B-Base extends the context length of Granite-3.0-2B-Base
 from 4K to 128K using a progressive training strategy by increasing the
 supported context length in increments while adjusting RoPE theta until
 the model has successfully adapted to desired length of 128K. This 
long-context pre-training stage was performed using approximately 500B 
tokens.

Developers: Granite Team, IBM
GitHub Repository: ibm-granite/granite-3.1-language-models
Website: Granite Docs
Paper: Granite 3.1 Language Models (coming soon) 
Release Date: December 18th, 2024
License: Apache 2.0


Supported Languages: 
English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, 
Italian, Korean, Dutch, and Chinese. Users may finetune Granite 3.1 
models for languages beyond these 12 languages.


Intended Use:
Prominent use cases of LLMs in text-to-text generation include 
summarization, text classification, extraction, question-answering, and 
other long-context tasks. All Granite Base models are able to handle 
these tasks as they were trained on a large amount of data from various 
domains. Moreover, they can serve as baseline to create specialized 
models for specific application scenarios.


Generation: 
This is a simple example of how to use Granite-3.1-2B-Base model.


Install the following libraries:


pip install torch torchvision torchaudio
pip install accelerate
pip install transformers



Then, copy the code snippet below to run the example.


from transformers import AutoModelForCausalLM, AutoTokenizer
device = "auto"
model_path = "ibm-granite/granite-3.1-2b-base"
tokenizer = AutoTokenizer.from_pretrained(model_path)
# drop device_map if running on CPU
model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device)
model.eval()
# change input text as desired
input_text = "Where is the Thomas J. Watson Research Center located?"
# tokenize the text
input_tokens = tokenizer(input_text, return_tensors="pt").to(device)
# generate output tokens
output = model.generate(**input_tokens,
                        max_length=4000)
# decode output tokens into text
output = tokenizer.batch_decode(output)
# print output
print(output)



Model Architecture: 
Granite-3.1-2B-Base is based on a decoder-only dense transformer 
architecture. Core components of this architecture are: GQA and RoPE, 
MLP with SwiGLU, RMSNorm, and shared input/output embeddings.

---
## Use with llama.cpp
Install llama.cpp through brew (works on Mac and Linux)

```bash
brew install llama.cpp

```
Invoke the llama.cpp server or the CLI.

### CLI:
```bash
llama-cli --hf-repo Triangle104/granite-3.1-2b-base-Q4_K_M-GGUF --hf-file granite-3.1-2b-base-q4_k_m.gguf -p "The meaning to life and the universe is"
```

### Server:
```bash
llama-server --hf-repo Triangle104/granite-3.1-2b-base-Q4_K_M-GGUF --hf-file granite-3.1-2b-base-q4_k_m.gguf -c 2048
```

Note: You can also use this checkpoint directly through the [usage steps](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#usage) listed in the Llama.cpp repo as well.

Step 1: Clone llama.cpp from GitHub.
```
git clone https://github.com/ggerganov/llama.cpp
```

Step 2: Move into the llama.cpp folder and build it with `LLAMA_CURL=1` flag along with other hardware-specific flags (for ex: LLAMA_CUDA=1 for Nvidia GPUs on Linux).
```
cd llama.cpp && LLAMA_CURL=1 make
```

Step 3: Run inference through the main binary.
```
./llama-cli --hf-repo Triangle104/granite-3.1-2b-base-Q4_K_M-GGUF --hf-file granite-3.1-2b-base-q4_k_m.gguf -p "The meaning to life and the universe is"
```
or 
```
./llama-server --hf-repo Triangle104/granite-3.1-2b-base-Q4_K_M-GGUF --hf-file granite-3.1-2b-base-q4_k_m.gguf -c 2048
```