metadata
license: apache-2.0
language:
- en
- de
- es
- fr
- it
- pt
- pl
- nl
- tr
- sv
- cs
- el
- hu
- ro
- fi
- uk
- sl
- sk
- da
- lt
- lv
- et
- bg
- 'no'
- ca
- hr
- ga
- mt
- gl
- zh
- ru
- ko
- ja
- ar
- hi
Model Card for EuroLLM-1.7B
This is the model card for the first pre-trained model of the EuroLLM series: EuroLLM-1.7B. You can also check the instruction tuned version: EuroLLM-1.7B-Instruct.
- Developed by: Unbabel, Instituto Superior Técnico, University of Edinburgh, Aveni, University of Paris-Saclay, University of Amsterdam, Naver Labs, Sorbonne Université, University of Turku, University of Oslo.
- Funded by: European Union.
- Model type: A 1.7B parameter multilingual transfomer LLM.
- Language(s) (NLP): Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Hungarian, Irish, Italian, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovenian, Spanish, Swedish, Arabic, Catalan, Chinese, Galician, Hindi, Japanese, Korean, Norwegian, Russian, Turkish, and Ukrainian.
- License: Apache License 2.0.
Model Details
The EuroLLM project has the goal of creating a suite of LLMs capable of understanding and generating text in all European Union languages as well as some additional relevant languages.
EuroLLM-1.7B is a 1.7B parameter model trained on 4 trillion tokens divided across the considered languages and several data sources: Web data, parallel data (en-xx and xx-en), and high-quality datasets.
EuroLLM-1.7B-Instruct was further instruction tuned on EuroBlocks, an instruction tuning dataset with focus on general instruction-following and machine translation.
Model Description
EuroLLM uses a standard, dense Transformer architecture:
- We use grouped query attention (GQA) with 8 key-value heads, since it has been shown to increase speed at inference time while maintaining downstream performance.
- We perform pre-layer normalization, since it improves the training stability, and use the RMSNorm, which is faster.
- We use the SwiGLU activation function, since it has been shown to lead to good results on downstream tasks.
- We use rotary positional embeddings (RoPE) in every layer, since these have been shown to lead to good performances while allowing the extension of the context length.
For pre-training, we use 256 Nvidia H100 GPUs of the Marenostrum 5 supercomputer, training the model with a constant batch size of 3,072 sequences, which corresponds to approximately 12 million tokens, using the Adam optimizer, and BF16 precision.
Here is a summary of the model hyper-parameters:
|
|
Sequence Length |
4,096 |
Number of Layers |
24 |
Embedding Size |
2,048 |
FFN Hidden Size |
5,632 |
Number of Heads |
16 |
Number of KV Heads (GQA) |
8 |
Activation Function |
SwiGLU |
Position Encodings |
RoPE (\Theta=10,000) |
Layer Norm |
RMSNorm |
Tied Embeddings |
No |
Embedding Parameters |
0.262B |
LM Head Parameters |
0.262B |
Non-embedding Parameters |
1.133B |
Total Parameters |
1.657B |
Run the model
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "utter-project/EuroLLM-1.7B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
text = "English: My name is EuroLLM. Portuguese:"
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=20)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Results
Machine Translation
We evaluate EuroLLM-1.7B-Instruct on several machine translation benchmarks: FLORES-200, WMT-23, and WMT-24 comparing it with Gemma-2B and Gemma-7B (also instruction tuned on EuroBlocks).
The results show that EuroLLM-1.7B is substantially better than Gemma-2B in Machine Translation and competitive with Gemma-7B.
Flores-200
Model |
AVG |
AVG en-xx |
AVG xx-en |
en-ar |
en-bg |
en-ca |
en-cs |
en-da |
en-de |
en-el |
en-es-latam |
en-et |
en-fi |
en-fr |
en-ga |
en-gl |
en-hi |
en-hr |
en-hu |
en-it |
en-ja |
en-ko |
en-lt |
en-lv |
en-mt |
en-nl |
en-no |
en-pl |
en-pt-br |
en-ro |
en-ru |
en-sk |
en-sl |
en-sv |
en-tr |
en-uk |
en-zh-cn |
ar-en |
bg-en |
ca-en |
cs-en |
da-en |
de-en |
el-en |
es-latam-en |
et-en |
fi-en |
fr-en |
ga-en |
gl-en |
hi-en |
hr-en |
hu-en |
it-en |
ja-en |
ko-en |
lt-en |
lv-en |
mt-en |
nl-en |
no-en |
pl-en |
pt-br-en |
ro-en |
ru-en |
sk-en |
sl-en |
sv-en |
tr-en |
uk-en |
zh-cn-en |
EuroLLM-1.7B-Instruct |
86.10 |
85.53 |
86.67 |
83.87 |
88.36 |
84.42 |
88.34 |
88.77 |
86.63 |
86.71 |
85.99 |
86.98 |
87.13 |
87.21 |
72.25 |
85.97 |
74.78 |
82.96 |
85.51 |
87.77 |
89.26 |
86.27 |
86.31 |
86.22 |
67.38 |
86.95 |
88.68 |
87.38 |
89.13 |
88.39 |
87.47 |
87.51 |
85.32 |
89.20 |
86.24 |
86.33 |
86.17 |
85.80 |
87.20 |
87.53 |
87.53 |
89.26 |
88.71 |
86.49 |
86.55 |
87.60 |
88.17 |
88.90 |
79.89 |
87.59 |
87.53 |
86.10 |
86.34 |
87.54 |
86.25 |
86.08 |
85.03 |
85.60 |
78.16 |
86.80 |
89.96 |
85.24 |
88.85 |
88.42 |
85.86 |
87.17 |
86.36 |
89.48 |
86.76 |
86.06 |
85.88 |
Gemma-2B-EuroBlocks |
81.56 |
78.93 |
84.18 |
75.25 |
82.46 |
83.17 |
82.17 |
84.40 |
83.20 |
79.63 |
84.15 |
72.63 |
81.00 |
85.12 |
38.79 |
82.00 |
67.00 |
81.18 |
78.24 |
84.80 |
87.08 |
82.04 |
73.02 |
68.41 |
56.67 |
83.30 |
86.69 |
83.07 |
86.82 |
84.00 |
84.55 |
77.93 |
76.19 |
80.77 |
79.76 |
84.19 |
84.10 |
83.67 |
85.73 |
86.89 |
86.38 |
88.39 |
88.11 |
84.68 |
86.11 |
83.45 |
86.45 |
88.22 |
50.88 |
86.44 |
85.87 |
85.33 |
85.16 |
86.75 |
85.62 |
85.00 |
81.55 |
81.45 |
67.90 |
85.95 |
89.05 |
84.18 |
88.27 |
87.38 |
85.13 |
85.22 |
83.86 |
87.83 |
84.96 |
85.15 |
85.10 |
Gemma-7B-EuroBlocks |
86.16 |
85.49 |
86.82 |
83.39 |
88.32 |
85.82 |
88.88 |
89.01 |
86.96 |
86.62 |
86.31 |
84.42 |
88.11 |
87.46 |
61.85 |
86.10 |
77.91 |
87.01 |
85.81 |
87.57 |
89.88 |
87.24 |
84.47 |
83.15 |
67.13 |
86.50 |
90.44 |
87.57 |
89.22 |
89.13 |
88.58 |
86.73 |
84.68 |
88.16 |
86.87 |
88.40 |
87.11 |
86.65 |
87.25 |
88.17 |
87.47 |
89.59 |
88.44 |
86.76 |
86.66 |
87.55 |
88.88 |
88.86 |
73.46 |
87.63 |
88.43 |
87.12 |
87.31 |
87.49 |
87.20 |
87.15 |
85.16 |
85.96 |
78.39 |
86.73 |
90.52 |
85.38 |
89.17 |
88.75 |
86.35 |
86.82 |
86.21 |
89.39 |
88.20 |
86.45 |
86.28 |
WMT-23
Model |
AVG |
AVG en-xx |
AVG xx-en |
AVG xx-xx |
en-de |
en-cs |
en-uk |
en-ru |
en-zh-cn |
de-en |
uk-en |
ru-en |
zh-cn-en |
cs-uk |
EuroLLM-1.7B-Instruct |
82.56 |
82.30 |
82.07 |
85.81 |
80.99 |
84.42 |
80.74 |
81.94 |
83.42 |
83.74 |
85.06 |
81.00 |
78.49 |
85.81 |
Gemma-2B-EuroBlocks |
79.86 |
78.35 |
81.32 |
81.56 |
76.54 |
76.35 |
77.62 |
78.88 |
82.36 |
82.85 |
83.83 |
80.17 |
78.42 |
81.56 |
Gemma-7B-EuroBlocks |
83.90 |
83.70 |
83.21 |
87.61 |
82.15 |
84.68 |
83.05 |
83.85 |
84.79 |
84.40 |
85.86 |
82.55 |
80.01 |
87.61 |
WMT-24
Model |
AVG |
AVG en-xx |
AVG xx-xx |
en-es-latam |
en-cs |
en-ru |
en-uk |
en-ja |
en-zh-cn |
en-hi |
cs-uk |
ja-zh-cn |
EuroLLM-1.7B-Instruct |
78.45 |
78.65 |
77.67 |
79.05 |
80.93 |
80.33 |
78.05 |
78.72 |
81.87 |
80.15 |
70.10 |
82.65 |
Gemma-2B-EuroBlocks |
74.71 |
74.25 |
76.57 |
75.21 |
78.84 |
70.40 |
74.44 |
75.55 |
78.32 |
78.70 |
62.51 |
79.97 |
Gemma-7B-EuroBlocks |
80.88 |
80.45 |
82.60 |
80.43 |
81.91 |
80.14 |
80.32 |
82.17 |
84.08 |
81.86 |
72.71 |
85.55 |
General Benchmarks
We also compare EuroLLM-1.7B with TinyLlama-1.1-3T and Gemma-2B on 3 general benchmarks: Arc Challenge, Hellaswag, and MMLU.
For the non-english languages we use the Okapi datasets.
Results show that EuroLLM-1.7B is superior to TinyLlama-1.1-3T and similar to Gemma-2B on Hellaswag but worse on Arc Challenge and MMLU. This can be due to the lower number of parameters of EuroLLM-1.7B (1.133B non-embedding parameters against 1.981B).
Arc Challenge
Model |
Average |
English |
German |
Spanish |
French |
Italian |
Portuguese |
Chinese |
Russian |
Dutch |
Arabic |
Swedish |
Hindi |
Hungarian |
Romanian |
Ukrainian |
Danish |
Catalan |
EuroLLM-1.7B |
0.3130 |
0.4215 |
0.3148 |
0.3376 |
0.3259 |
0.3396 |
0.3410 |
0.3068 |
0.2626 |
0.3037 |
0.2652 |
0.3279 |
0.2688 |
0.3039 |
0.3085 |
0.2943 |
0.2956 |
0.3027 |
TinyLlama-1.1-3T |
0.2621 |
0.3473 |
0.2541 |
0.2726 |
0.2797 |
0.2643 |
0.2829 |
0.2573 |
0.2421 |
0.2404 |
0.2335 |
0.2661 |
0.2337 |
0.244 |
0.2536 |
0.2626 |
0.2476 |
0.2736 |
Gemma-2B |
0.3617 |
0.4846 |
0.3755 |
0.3940 |
0.4080 |
0.3687 |
0.3872 |
0.3726 |
0.3456 |
0.3328 |
0.3122 |
0.3519 |
0.2851 |
0.3039 |
0.3590 |
0.3601 |
0.3565 |
0.3516 |
Hellaswag
Model |
Average |
English |
German |
Spanish |
French |
Italian |
Portuguese |
Russian |
Dutch |
Arabic |
Swedish |
Hindi |
Hungarian |
Romanian |
Ukrainian |
Danish |
Catalan |
EuroLLM-1.7B |
0.4653 |
0.6199 |
0.4653 |
0.5187 |
0.5173 |
0.5024 |
0.5116 |
0.4582 |
0.4821 |
0.3939 |
0.4722 |
0.3505 |
0.3970 |
0.4441 |
0.4224 |
0.4556 |
0.4329 |
TinyLlama-1.1-3T |
0.3710 |
0.6027 |
0.3652 |
0.4136 |
0.4104 |
0.3780 |
0.4008 |
0.3544 |
0.3637 |
0.2981 |
0.3569 |
0.2904 |
0.3147 |
0.3337 |
0.3440 |
0.3464 |
0.3628 |
Gemma-2B |
0.4666 |
0.7165 |
0.4756 |
0.5414 |
0.5180 |
0.4841 |
0.5081 |
0.4664 |
0.4655 |
0.3868 |
0.4383 |
0.3413 |
0.3710 |
0.4316 |
0.4291 |
0.4471 |
0.4448 |
MMLU
Model |
Average |
English |
German |
Spanish |
French |
Italian |
Portuguese |
Chinese |
Russian |
Dutch |
Arabic |
Swedish |
Hindi |
Hungarian |
Romanian |
Ukrainian |
Danish |
Catalan |
EuroLLM-1.7B |
0.2631 |
0.2553 |
0.2626 |
0.2653 |
0.2589 |
0.2628 |
0.2634 |
0.2546 |
0.2626 |
0.2677 |
0.2608 |
0.2656 |
0.2690 |
0.2551 |
0.2677 |
0.2655 |
0.2675 |
0.2689 |
TinyLlama-1.1-3T |
0.2546 |
0.2604 |
0.2498 |
0.2528 |
0.2535 |
0.2531 |
0.2511 |
0.2629 |
0.2541 |
0.2521 |
0.2591 |
0.2528 |
0.2550 |
0.2566 |
0.2548 |
0.2651 |
0.2419 |
0.2528 |
Gemma-2B |
0.3356 |
0.4168 |
0.3519 |
0.3475 |
0.3463 |
0.3433 |
0.3383 |
0.3345 |
0.3261 |
0.3429 |
0.3158 |
0.3318 |
0.2842 |
0.3185 |
0.3243 |
0.3152 |
0.3377 |
0.3307 |