File size: 11,769 Bytes

---
license: apache-2.0
language:
- en
- de
- es
- fr
- it
- pt
- pl
- nl
- tr
- sv
- cs
- el
- hu
- ro
- fi
- uk
- sl
- sk
- da
- lt
- lv
- et
- bg
- no
- ca
- hr
- ga
- mt
- gl
- zh
- ru
- ko
- ja
- ar
- hi
---
# Model Card for EuroLLM-1.7B-Instruct


This is the model card for the first instruction tuned model of the EuroLLM series: EuroLLM-1.7B-Instruct. You can also check the pre-trained version: [EuroLLM-1.7B](https://huggingface.co/utter-project/EuroLLM-1.7B).

- **Developed by:** Unbabel, Instituto Superior Técnico, University of Edinburgh, Aveni, University of Paris-Saclay, University of Amsterdam, Naver Labs, Sorbonne Université.
- **Funded by:** European Union.
- **Model type:** A 1.7B parameter instruction tuned multilingual transfomer LLM.
- **Language(s) (NLP):** Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Hungarian, Irish, Italian, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovenian, Spanish, Swedish, Arabic, Catalan, Chinese, Galician, Hindi, Japanese, Korean, Norwegian, Russian, Turkish, and Ukrainian. 
- **License:** Apache License 2.0.

## Model Details

The EuroLLM project has the goal of creating a suite of LLMs capable of understanding and generating text in all European Union languages as well as some additional relevant languages.
EuroLLM-1.7B is a 1.7B parameter model trained on 4 trillion tokens divided across the considered languages and several data sources: Web data, parallel data (en-xx and xx-en), and high-quality datasets.
EuroLLM-1.7B-Instruct was further instruction tuned on EuroBlocks, an instruction tuning dataset with focus on general instruction-following and machine translation.


### Model Description

EuroLLM uses a standard, dense Transformer architecture:
- We use grouped query attention (GQA) with 8 key-value heads, since it has been shown to increase speed at inference time while maintaining downstream performance.
- We perform pre-layer normalization, since it improves the training stability, and use the RMSNorm, which is faster.
- We use the SwiGLU activation function, since it has been shown to lead to good results on downstream tasks.
- We use rotary positional embeddings (RoPE) in every layer, since these have been shown to lead to good performances while allowing the extension of the context length.

For pre-training, we use 256 Nvidia H100 GPUs of the Marenostrum 5 supercomputer, training the model with a constant batch size of 3,072 sequences, which corresponds to approximately 12 million tokens, using the Adam optimizer, and BF16 precision.
Here is a summary of the model hyper-parameters:
|                                      |                      |
|--------------------------------------|----------------------|
| Sequence Length                      | 4,096                |
| Number of Layers                     | 24                   |
| Embedding Size                       | 2,048                |
| FFN Hidden Size                      | 5,632                |
| Number of Heads                      | 16                   |
| Number of KV Heads (GQA)             | 8                    |
| Activation Function                  | SwiGLU               |
| Position Encodings                   | RoPE (\Theta=10,000) |
| Layer Norm                           | RMSNorm              |
| Tied Embeddings                      | No                   |
| Embedding Parameters                 | 0.262B               |
| LM Head Parameters                   | 0.262B               |
| Non-embedding Parameters             | 1.133B               |
| Total Parameters                     | 1.657B               |

## Run the model
    
    from transformers import AutoModelForCausalLM, AutoTokenizer
    
    model_id = "utter-project/EuroLLM-1.7B-Instruct"
    tokenizer = AutoTokenizer.from_pretrained(model_id)
    model = AutoModelForCausalLM.from_pretrained(model_id)
    
    text = "English: My name is EuroLLM. Portuguese:"
    
    inputs = tokenizer(text, return_tensors="pt")
    outputs = model.generate(**inputs, max_new_tokens=20)
    print(tokenizer.decode(outputs[0], skip_special_tokens=True))



## Results

### Machine Translation

We evaluate EuroLLM-1.7B-Instruct on several machine translation benchmarks: FLORES-200, WMT-23, and WMT-24 comparing it with [Gemma-2B](https://huggingface.co/google/gemma-2b) and [Gemma-7B](https://huggingface.co/google/gemma-7b) (also instruction tuned on EuroBlocks).
The results show that EuroLLM-1.7B is substantially better than Gemma-2B in Machine Translation and competitive with Gemma-7B.

#### Flores-200
| Model                          | AVG  | AVG en-xx | AVG xx-en | en-ar | en-bg | en-ca | en-cs | en-da | en-de | en-el | en-es-latam | en-et | en-fi | en-fr | en-ga | en-gl | en-hi | en-hr | en-hu | en-it | en-ja | en-ko | en-lt | en-lv | en-mt | en-nl | en-no | en-pl | en-pt-br | en-ro | en-ru | en-sk | en-sl | en-sv | en-tr | en-uk | en-zh-cn | ar-en | bg-en | ca-en | cs-en | da-en | de-en | el-en | es-latam-en | et-en | fi-en | fr-en | ga-en | gl-en | hi-en | hr-en | hu-en | it-en | ja-en | ko-en | lt-en | lv-en | mt-en | nl-en | no-en | pl-en | pt-br-en | ro-en | ru-en | sk-en | sl-en | sv-en | tr-en | uk-en | zh-cn-en |
|--------------------------------|------|-----------|-----------|-------|-------|-------|-------|-------|-------|-------|--------------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|----------|-------|-------|-------|-------|-------|-------|-------|----------|-------|-------|-------|-------|-------|-------|-------|--------------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|----------|-------|-------|-------|-------|-------|-------|-------|----------|
| EuroLLM-1.7B-Instruct     | 86.75|	86.49|	87.01|85.00|	89.37|	84.65|	89.23|	89.54|	87.00|	87.68|	86.43|	88.66|	89.18|	87.65|	74.66|	86.68|	76.94|	84.86|	86.43|	88.19|	89.45|	87.33|	87.73|	87.74|	67.66|	87.16|	90.08|	88.10|	89.39|	89.39|	88.11|	88.13|	87.08|	89.67|	87.21|	87.76|	86.45|	86.15|	87.46|	87.49|	87.97|	89.65|	88.80|	86.88|	86.70|	88.19|	88.66|	88.93|	81.95|	87.38|	87.97|	86.69|	87.15|	87.69|	86.78|	86.76|	85.39|	86.23|	76.93|	87.02|	90.24|	85.54|	89.22|	88.81|	86.02|	87.24|	86.81|	89.55|	87.76|	86.37|	86.00 |
| Gemma-2B-EuroBlocks       | 81.56| 78.93     | 84.18     | 75.25 | 82.46 | 83.17 | 82.17 | 84.40 | 83.20 | 79.63 | 84.15        | 72.63 | 81.00 | 85.12 | 38.79 | 82.00 | 67.00 | 81.18 | 78.24 | 84.80 | 87.08 | 82.04 | 73.02 | 68.41 | 56.67 | 83.30 | 86.69 | 83.07   | 86.82 | 84.00 | 84.55 | 77.93 | 76.19 | 80.77 | 79.76 | 84.19   | 84.10 | 83.67 | 85.73 | 86.89 | 86.38 | 88.39 | 88.11 | 84.68        | 86.11 | 83.45 | 86.45 | 88.22 | 50.88 | 86.44 | 85.87 | 85.33 | 85.16 | 86.75 | 85.62 | 85.00 | 81.55 | 81.45 | 67.90 | 85.95 | 89.05   | 84.18 | 88.27 | 87.38 | 85.13 | 85.22 | 83.86 | 87.83   | 84.96 | 85.15 | 85.10 |
| Gemma-7B-EuroBlocks       | 86.16| 85.49     | 86.82     | 83.39 | 88.32 | 85.82 | 88.88 | 89.01 | 86.96 | 86.62 | 86.31        | 84.42 | 88.11 | 87.46 | 61.85 | 86.10 | 77.91 | 87.01 | 85.81 | 87.57 | 89.88 | 87.24 | 84.47 | 83.15 | 67.13 | 86.50 | 90.44 | 87.57   | 89.22 | 89.13 | 88.58 | 86.73 | 84.68 | 88.16 | 86.87 | 88.40   | 87.11 | 86.65 | 87.25 | 88.17 | 87.47 | 89.59 | 88.44 | 86.76        | 86.66 | 87.55 | 88.88 | 88.86 | 73.46 | 87.63 | 88.43 | 87.12 | 87.31 | 87.49 | 87.20 | 87.15 | 85.16 | 85.96 | 78.39 | 86.73 | 90.52   | 85.38 | 89.17 | 88.75 | 86.35 | 86.82 | 86.21 | 89.39   | 88.20 | 86.45 | 86.28 |


#### WMT-23
| Model                          | AVG  | AVG en-xx | AVG xx-en | AVG xx-xx | en-de | en-cs | en-uk | en-ru | en-zh-cn | de-en | uk-en | ru-en | zh-cn-en | cs-uk |
|--------------------------------|------|-----------|-----------|-----------|-------|-------|-------|-------|----------|-------|-------|-------|----------|-------|
| EuroLLM-1.7B-Instruct   | 83.13 |	82.91 |	82.48	| 86.87|	81.33|	85.42|	81.61|	82.57|	83.62|	84.24|	85.36|	81.56|	78.76|	86.87 |
| Gemma-2B-EuroBlocks       | 79.86| 78.35     | 81.32     | 81.56     | 76.54 | 76.35 | 77.62 | 78.88 | 82.36    | 82.85 | 83.83 | 80.17 | 78.42    | 81.56 |
| Gemma-7B-EuroBlocks       | 83.90| 83.70     | 83.21     | 87.61     | 82.15 | 84.68 | 83.05 | 83.85 | 84.79    | 84.40 | 85.86 | 82.55 | 80.01    | 87.61 |


#### WMT-24
| Model | AVG | AVG en-xx | AVG xx-xx | en-es-latam | en-cs | en-ru | en-uk | en-ja | en-zh-cn | en-hi | cs-uk | ja-zh-cn |
|---------|------|------|-------|-------|-------|-------|--------|--------|-------|-------|-------|-----|
| EuroLLM-1.7B-Instruct|79.35|79.45|78.96|79.20|81.17|80.82|79.00|80.54|82.39|80.80|71.69|83.16|74.76|
|Gemma-2B-EuroBlocks| 74.71|74.25|76.57|75.21|78.84|70.40|74.44|75.55|78.32|78.70|62.51|79.97|73.17|
|Gemma-7B-EuroBlocks| 80.88|80.45|82.60|80.43|81.91|80.14|80.32|82.17|84.08|81.86|72.71|85.55|79.65|

### General Benchmarks
We also compare EuroLLM-1.7B with [TinyLlama-v1.1](https://huggingface.co/TinyLlama/TinyLlama_v1.1) and [Gemma-2B](https://huggingface.co/google/gemma-2b) on 3 general benchmarks: Arc Challenge and Hellaswag.
For the non-english languages we use the [Okapi](https://aclanthology.org/2023.emnlp-demo.28.pdf) datasets.
Results show that EuroLLM-1.7B is superior to TinyLlama-1.1-3T and similar to Gemma-2B on Hellaswag but worse on Arc Challenge. This can be due to the lower number of parameters of EuroLLM-1.7B (1.133B non-embedding parameters against 1.981B).

#### Arc Challenge
| Model              | Average | English | German | Spanish | French | Italian | Portuguese | Chinese | Russian | Dutch | Arabic | Swedish | Hindi  | Hungarian | Romanian | Ukrainian | Danish | Catalan |
|--------------------|---------|---------|--------|---------|--------|---------|------------|---------|---------|-------|--------|---------|--------|-----------|----------|-----------|--------|---------|
| EuroLLM-1.7B-Instruct | 0.3268  | 0.3218  | 0.4070  | 0.3293  | 0.3521  | 0.3370  | 0.3422  | 0.3496  | 0.3060  | 0.3122   | 0.3174   | 0.2866   | 0.3373   | 0.2817   | 0.3031   | 0.3179   | 0.3199   | 0.3248   | 0.3310   |
| TinyLlama-v1.1       | 0.2650  | 0.2583  | 0.3712  | 0.2524  | 0.2795  | 0.2883  | 0.2652  | 0.2906  | 0.2410  | 0.2669   | 0.2404   | 0.2310   | 0.2687   | 0.2354   | 0.2449   | 0.2476   | 0.2524   | 0.2494   | 0.2796   |
| Gemma-2B             | 0.3617  | 0.3540  | 0.4846  | 0.3755  | 0.3940  | 0.4080  | 0.3687  | 0.3872  | 0.3726  | 0.3456   | 0.3328   | 0.3122   | 0.3519   | 0.2851   | 0.3039   | 0.3590   | 0.3601   | 0.3565   | 0.3516   |
#### Hellaswag
| Model              | Average | English | German | Spanish | French | Italian | Portuguese | Russian | Dutch  | Arabic | Swedish | Hindi  | Hungarian | Romanian | Ukrainian | Danish | Catalan |
|--------------------|---------|---------|--------|---------|--------|---------|------------|---------|--------|--------|---------|--------|-----------|----------|-----------|--------|---------|
| EuroLLM-1.7B-Instruct | 0.4744  | 0.4654  | 0.6084  | 0.4772  | 0.5310  | 0.5260  | 0.5067  | 0.5206  | 0.4674  | 0.4893   | 0.4075   | 0.4813   | 0.3605   | 0.4067   | 0.4598   | 0.4368   | 0.4700   | 0.4405   |
| TinyLlama-v1.1       |0.3674  | 0.3503  | 0.6248  | 0.3650  | 0.4137  | 0.4010  | 0.3780  | 0.3892  | 0.3494  | 0.3588   | 0.2880   | 0.3561   | 0.2841   | 0.3073   | 0.3267   | 0.3349   | 0.3408   | 0.3613   |
| Gemma-2B             |0.4666  | 0.4499  | 0.7165  | 0.4756  | 0.5414  | 0.5180  | 0.4841  | 0.5081  | 0.4664  | 0.4655   | 0.3868   | 0.4383   | 0.3413   | 0.3710   | 0.4316   | 0.4291   | 0.4471   | 0.4448   |