Text Generation
Transformers
PyTorch
Italian
English
mistral
conversational
text-generation-inference
Inference Endpoints
File size: 13,012 Bytes
458f5ed
 
 
 
 
 
 
 
 
 
 
b61da2d
458f5ed
 
fbea1e5
 
 
b61da2d
 
 
 
458f5ed
 
 
 
 
b61da2d
458f5ed
 
 
 
 
9d71359
458f5ed
 
 
b61da2d
 
 
 
 
 
 
 
 
 
fbea1e5
b61da2d
 
 
 
 
 
 
 
 
 
 
fbea1e5
b61da2d
 
 
 
 
 
458f5ed
 
 
 
 
 
 
b61da2d
fbea1e5
458f5ed
 
 
 
fbea1e5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d4c8fd7
 
 
 
 
 
 
 
 
 
 
 
 
 
9c78a7a
 
d4c8fd7
 
 
 
 
 
 
fbea1e5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
458f5ed
 
b61da2d
9d71359
458f5ed
 
 
b61da2d
458f5ed
 
 
b61da2d
458f5ed
 
 
 
b61da2d
458f5ed
 
 
b61da2d
458f5ed
 
 
 
 
 
 
 
 
 
b61da2d
458f5ed
 
 
9d71359
458f5ed
 
 
b61da2d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9f6f326
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
---
license: apache-2.0
datasets:
- andreabac3/Quora-Italian-Fauno-Baize
- andreabac3/StackOverflow-Italian-Fauno-Baize
- andreabac3/MedQuaAD-Italian-Fauno-Baize
language:
- it
- en
pipeline_tag: text-generation
---

# cerbero-7b Italian LLM ๐Ÿš€ 

> ๐Ÿš€ **New Release**: **cerbero-7b-openchat** our latest SOTA model based on [**openchat3.5**](https://github.com/imoneoi/openchat), delivering performance **on par with** or **superior** to **ChatGPT 3.5**!

> ๐Ÿ”ฅ The research paper unveiling the secrets behind **cerbero-7b** is now available on [arXiv](https://arxiv.org/abs/2311.15698)!

> ๐Ÿ“ข **cerbero-7b** is the first **100% Free** and Open Source **Italian Large Language Model** (LLM) ready to be used for **research** or **commercial applications**.

**Try an online demo [here](https://huggingface.co/spaces/galatolo/chat-with-cerbero-7b)** (quantized demo running on CPU, a lot less powerful than the original cerbero-7b)

<p align="center">
  <img width="300" height="300" src="./README.md.d/cerbero.png">
</p>

Built on top of [**mistral-7b**](https://mistral.ai/news/announcing-mistral-7b/), which outperforms Llama2 13B across all benchmarks and surpasses Llama1 34B in numerous metrics.

**cerbero-7b** is specifically crafted to fill the void in Italy's AI landscape.

A **cambrian explosion** of **Italian Language Models** is essential for building advanced AI architectures that can cater to the diverse needs of the population.

**cerbero-7b**, alongside companions like [**Camoscio**](https://github.com/teelinsan/camoscio) and [**Fauno**](https://github.com/RSTLess-research/Fauno-Italian-LLM), aims to help **kick-start** this **revolution** in Italy, ushering in an era where sophisticated **AI solutions** can seamlessly interact with and understand the intricacies of the **Italian language**, thereby empowering **innovation** across **industries** and fostering a deeper **connection** between **technology** and the **people** it serves.

**cerbero-7b** is released under the **permissive** Apache 2.0 **license**, allowing **unrestricted usage**, even **for commercial applications**.

## Model Evaluation Results ๐Ÿ“ˆ

The `cerbero-7b` model has been rigorously evaluated across several benchmarks to demonstrate its proficiency in understanding and generating Italian text. Below are the summarized results showcasing its performance:

### SQuAD-it Evaluation

The Stanford Question Answering Dataset (SQuAD) in Italian (SQuAD-it) is used to evaluate the model's reading comprehension and question-answering capabilities. The following table presents the F1 score and Exact Match (EM) metrics:

| Model                                        | F1 Score | Exact Match (EM) |
|----------------------------------------------|--------------|----------------------|
| **cerbero-7b-openchat**                      | **74.09%**   | **56.0%**            |
| **cerbero-7b**                               | **72.55%**   | **55.6%**            |
| Fauno                                        | 44.46%       | 0.00%                |
| Camoscio                                     | 37.42%       | 0.00%                |
| mistral-7b                                   | 15.55%       | 8.50%                |

### EVALITA Benchmark Results

EVALITA benchmarks assess the model's performance in tasks like toxicity detection, irony detection, and sentiment analysis. The table below shows the F1 scores for these tasks:

| Model                                        | Toxicity Detection | Irony Detection | Sentiment Analysis |
|----------------------------------------------|--------------------|-----------------|--------------------|
| **cerbero-7b-openchat**                      | **63.33%**         | **69.16%**      | **66.89%**         |
| **cerbero-7b**                               | **63.04%**         | **48.51%**      | **61.80%**         |
| Fauno                                        | 33.84%             | 39.17%          | 12.23%             |
| Camoscio                                     | 38.18%             | 39.65%          | 13.33%             |
| mistral-7b                                   | 34.16%             | 34.16%          | 12.14%             |


## Why Cerbero? ๐Ÿค”

The name "Cerbero," inspired by the three-headed dog that guards the gates of the Underworld in Greek mythology, encapsulates the essence of our model, drawing strength from three foundational pillars:

- **Base Model: mistral-7b** ๐Ÿ—๏ธ
  cerbero-7b builds upon the formidable **mistral-7b** as its base model. This choice ensures a robust foundation, leveraging the power and capabilities of a cutting-edge language model.

- **Datasets: Cerbero Dataset** ๐Ÿ“š
  The Cerbero Dataset is a groundbreaking collection specifically curated to enhance the proficiency of cerbero-7b in understanding and generating Italian text. This dataset is a product of an innovative method combining dynamic self-chat mechanisms with advanced Large Language Model (LLM) technology. Refer to the [paper](https://arxiv.org/abs/2311.15698) for more details.

- **Licensing: Apache 2.0** ๐Ÿ•Š๏ธ
  Released under the **permissive Apache 2.0 license**, cerbero-7b promotes openness and collaboration. This licensing choice empowers developers with the freedom for unrestricted usage, fostering a community-driven approach to advancing AI in Italy and beyond.

## Models ๐Ÿงฌ

**cerbero-7b** is available in various flavors, each tailored for specific applications and use cases. Below is a table listing these versions along with their respective training datasets and base models:

| Model Name              | Training Dataset  | Base Model  | Huggingface Model | Llama.cpp and Quantized Model |
|-------------------------|-------------------|-------------|-------------------|-------------------------------|
| cerbero-7b              | Cerbero Dataset   | mistral-7b  | [link](https://huggingface.co/galatolo/cerbero-7b) | [link](https://huggingface.co/galatolo/cerbero-7b-gguf) |
| cerbero-7b-openchat     | Cerbero Dataset   | openchat3.5 | [link](https://huggingface.co/galatolo/cerbero-7b-openchat) | [link](https://huggingface.co/galatolo/cerbero-7b-openchat-gguf) |


Each of these models brings its unique strengths to the table, making **cerbero-7b** a versatile tool for both research and commercial applications in the Italian language AI domain.

We are committed to continuously enhancing **cerbero-7b**. Our team plans to keep training and releasing new models as advancements in the 7b SOTA occur. This ensures that **cerbero-7b** remains at the forefront of AI technology, offering the most advanced and efficient solutions in the Italian language AI sector.

If you do not have enough RAM to fit the `float32` model (for example when using Colab) we provide for each model a `float16` version using the `revision="float16"` argument 

```python
model = AutoModelForCausalLM.from_pretrained("galatolo/cerbero-7b", revision="float16")
```

## Training Details ๐Ÿš€

**cerbero-7b** is a **fully fine-tuned** LLM, distinguishing itself from LORA or QLORA fine-tunes. 
The model is trained on an expansive Italian Large Language Model (LLM) using synthetic datasets generated through dynamic self-chat on a large context window of **8192 tokens**

### Dataset Composition ๐Ÿ“Š

> ๐Ÿ“ข Details on the **Cerbero Dataset** will be updated shortly!

### Training Setup โš™๏ธ

**cerbero-7b** is trained on an NVIDIA DGX H100:

- **Hardware:** Utilizing 8xH100 GPUs, each with 80 GB VRAM. ๐Ÿ–ฅ๏ธ
- **Parallelism:** DeepSpeed Zero stage 1 parallelism for optimal training efficiency.โœจ

The model has been trained for **1 epoch**, ensuring a convergence of knowledge and proficiency in handling diverse linguistic tasks.

## Prompt Format

**cerbero-7b** is trained on full conversations using the following prompt format:

```
[|Umano|] First human message
[|Assistente|] First AI reply
[|Umano|] Second human message
[|Assistente|] Second AI reply
```

When crafting prompts, ensure to conclude with the `[|Assistente|]` tag, signaling the AI to generate a response.
Use `[|Umano|]` as stop word.

For example:

```
[|Umano|] Come posso distinguere un AI da un umano?
[|Assistente|]
```

While it's possible to include a brief system message at the start of your prompt, remember that the training data for **cerbero-7b** **does not** contain such **system messages**. Hence, it's recommended to minimize or avoid including them for optimal model performance.

## Getting Started ๐Ÿš€

You can load **cerbero-7b** (or **cerbero-7b-openchat**) using [๐Ÿค—transformers](https://huggingface.co/docs/transformers/index)

```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("galatolo/cerbero-7b")
tokenizer = AutoTokenizer.from_pretrained("galatolo/cerbero-7b")

prompt = """Questa รจ una conversazione tra un umano ed un assistente AI.
[|Umano|] Come posso distinguere un AI da un umano?
[|Assistente|]"""

input_ids = tokenizer(prompt, return_tensors='pt').input_ids
with torch.no_grad():
    output_ids = model.generate(input_ids, max_new_tokens=128)

generated_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)
print(generated_text)
```

### GGUF and llama.cpp

**cerbero-7b** is fully **compatibile** with [llama.cpp](https://github.com/ggerganov/llama.cpp)

You can find the **original** and **quantized** versions of **cerbero-7b** in the `gguf` format [here](https://huggingface.co/galatolo/cerbero-7b-gguf/tree/main)

```python
from llama_cpp import Llama
from huggingface_hub import hf_hub_download  

llm = Llama(
    model_path=hf_hub_download(
        repo_id="galatolo/cerbero-7b-gguf",
        filename="ggml-model-f16.gguf",
    ),
    n_ctx=4086,
) 

llm.generate("""Questa รจ una conversazione tra un umano ed un assistente AI.
[|Umano|] Come posso distinguere un AI da un umano?
[|Assistente|]""")
```

## Citation ๐Ÿ“–

If you use **cerbero-7b** in your research, please cite our paper:

```bibtex
@article{galatolo2023cerbero,
  title={Cerbero-7B: A Leap Forward in Language-Specific LLMs Through Enhanced Chat Corpus Generation and Evaluation},
  author={Galatolo, Federico A and Cimino, Mario GCA},
  journal={arXiv preprint arXiv:2311.15698},
  year={2023}
}
```
## Training Details ๐Ÿš€

**cerbero-7b** is a **fully fine-tuned** LLM, distinguishing itself from LORA or QLORA fine-tunes. 
The model is trained on an expansive Italian Large Language Model (LLM) using synthetic datasets generated through dynamic self-chat on a large context window of **8192 tokens**

### Dataset Composition ๐Ÿ“Š

> ๐Ÿ“ข Details on the **Cerbero Dataset** will be updated shortly!

### Training Setup โš™๏ธ

**cerbero-7b** is trained on an NVIDIA DGX H100:

- **Hardware:** Utilizing 8xH100 GPUs, each with 80 GB VRAM. ๐Ÿ–ฅ๏ธ
- **Parallelism:** DeepSpeed Zero stage 1 parallelism for optimal training efficiency.โœจ

The model has been trained for **1 epoch**, ensuring a convergence of knowledge and proficiency in handling diverse linguistic tasks.

## Getting Started ๐Ÿš€

You can load **cerbero-7b** using [๐Ÿค—transformers](https://huggingface.co/docs/transformers/index)

```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("galatolo/cerbero-7b")
tokenizer = AutoTokenizer.from_pretrained("galatolo/cerbero-7b")

prompt = """Questa รจ una conversazione tra un umano ed un assistente AI.
[|Umano|] Come posso distinguere un AI da un umano?
[|Assistente|]"""

input_ids = tokenizer(prompt, return_tensors='pt').input_ids
with torch.no_grad():
    output_ids = model.generate(input_ids, max_new_tokens=128)

generated_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)
print(generated_text)
```

### GGUF and llama.cpp

**cerbero-7b** is fully **compatibile** with [llama.cpp](https://github.com/ggerganov/llama.cpp)

You can find the **original** and **quantized** versions of **cerbero-7b** in the `gguf` format [here](https://huggingface.co/galatolo/cerbero-7b-gguf/tree/main)

```python
from llama_cpp import Llama
from huggingface_hub import hf_hub_download  

llm = Llama(
    model_path=hf_hub_download(
        repo_id="galatolo/cerbero-7b-gguf",
        filename="ggml-model-Q4_K.gguf",
    ),
    n_ctx=4086,
) 

llm.generate("""Questa รจ una conversazione tra un umano ed un assistente AI.
[|Umano|] Come posso distinguere un AI da un umano?
[|Assistente|]""")
```

## Differences from the paper

> ๐Ÿ“ข Attention: The released versions of `cerbero-7b` slightly differ from those used in the paper. The training dataset for the released models was generated using `garage-bAInd/Platypus2-70B-instruct` instead of `meta-llama/Llama-2-7b-chat-hf`, due to the more permissive license of the Platypus2 model (CC-BY-NC 4.0). Our tests indicate that both models produce datasets of comparable quality, and the resulting fine-tuned models demonstrate nearly indistinguishable performance.