Upload 8 files
Browse files- README.md +154 -3
- config.json +27 -0
- generation_config.json +6 -0
- pytorch_model.bin +3 -0
- special_tokens_map.json +23 -0
- tokenizer.json +0 -0
- tokenizer.model +3 -0
- tokenizer_config.json +42 -0
README.md
CHANGED
@@ -1,3 +1,154 @@
|
|
1 |
-
---
|
2 |
-
license: apache-2.0
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
language:
|
4 |
+
- en
|
5 |
+
- de
|
6 |
+
- es
|
7 |
+
- fr
|
8 |
+
- it
|
9 |
+
- pt
|
10 |
+
- pl
|
11 |
+
- nl
|
12 |
+
- tr
|
13 |
+
- sv
|
14 |
+
- cs
|
15 |
+
- el
|
16 |
+
- hu
|
17 |
+
- ro
|
18 |
+
- fi
|
19 |
+
- uk
|
20 |
+
- sl
|
21 |
+
- sk
|
22 |
+
- da
|
23 |
+
- lt
|
24 |
+
- lv
|
25 |
+
- et
|
26 |
+
- bg
|
27 |
+
- no
|
28 |
+
- ca
|
29 |
+
- hr
|
30 |
+
- ga
|
31 |
+
- mt
|
32 |
+
- gl
|
33 |
+
- zh
|
34 |
+
- ru
|
35 |
+
- ko
|
36 |
+
- ja
|
37 |
+
- ar
|
38 |
+
- hi
|
39 |
+
---
|
40 |
+
# Model Card for EuroLLM-1.7B
|
41 |
+
|
42 |
+
|
43 |
+
This is the model card for the first models of the EuroLLM series: EuroLLM-1.7B and [EuroLLM-1.7B-EuroBlocks-v0.1](https://huggingface.co/Unbabel/EuroLLM-1B-EuroBlocks-v0.1).
|
44 |
+
|
45 |
+
- **Developed by:** Unbabel, Instituto Superior Técnico, University of Edinburgh, CentraleSupélec University of Paris-Saclay.
|
46 |
+
- **Funded by:** European Union.
|
47 |
+
- **Model type:** A 1.7B parameter multilingual transfomer LLM.
|
48 |
+
- **Language(s) (NLP):** Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Hungarian, Irish, Italian, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovenian, Spanish, Swedish, Arabic, Catalan, Chinese, Galician, Hindi, Japanese, Korean, Norwegian, Russian, Turkish, and Ukrainian.
|
49 |
+
- **License:** Apache License 2.0.
|
50 |
+
|
51 |
+
## Model Details
|
52 |
+
|
53 |
+
The EuroLLM project has the goal of creating a suite of LLMs capable of understanding and generating text in all European Union languages as well as some additional relevant languages.
|
54 |
+
EuroLLM-1.7B is a 1.7B parameter model trained on 4 trillion tokens divided across the considered languages and several data sources: Web data, parallel data (en-xx and xx-en), and high-quality datasets.
|
55 |
+
EuroLLM-1.7B-EuroBlocks-v0.1 was further instruction tuned on EuroBlocks-v0.1, an instruction tuning dataset predominantly focusing on machine translation.
|
56 |
+
|
57 |
+
|
58 |
+
### Model Description
|
59 |
+
|
60 |
+
EuroLLM uses a standard, dense Transformer architecture:
|
61 |
+
- We use grouped query attention (GQA) with 8 key-value heads, since it has been shown to increase speed at inference time while maintaining downstream performance.
|
62 |
+
- We perform pre-layer normalization, since it improves the training stability, and use the RMSNorm, which is faster.
|
63 |
+
- We use the SwiGLU activation function, since it has been shown to lead to good results on downstream tasks.
|
64 |
+
- We use rotary positional embeddings (RoPE) in every layer, since these have been shown to lead to good performances while allowing the extension of the context length.
|
65 |
+
|
66 |
+
For pre-training, we use 256 Nvidia H100 GPUs of the Marenostrum 5 supercomputer, training the model with a constant batch size of 3,072 sequences, which corresponds to approximately 12 million tokens, using the Adam optimizer, and BF16 precision.
|
67 |
+
Here is a summary of the model hyper-parameters:
|
68 |
+
| | |
|
69 |
+
|--------------------------------------|----------------------|
|
70 |
+
| Sequence Length | 4,096 |
|
71 |
+
| Number of Layers | 24 |
|
72 |
+
| Embedding Size | 2,048 |
|
73 |
+
| FFN Hidden Size | 5,632 |
|
74 |
+
| Number of Heads | 16 |
|
75 |
+
| Number of KV Heads (GQA) | 8 |
|
76 |
+
| Activation Function | SwiGLU |
|
77 |
+
| Position Encodings | RoPE (\Theta=10,000) |
|
78 |
+
| Layer Norm | RMSNorm |
|
79 |
+
| Tied Embeddings | No |
|
80 |
+
| Embedding Parameters | 0.262B |
|
81 |
+
| LM Head Parameters | 0.262B |
|
82 |
+
| Non-embedding Parameters | 1.133B |
|
83 |
+
| Total Parameters | 1.657B |
|
84 |
+
|
85 |
+
## Run the model
|
86 |
+
|
87 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
88 |
+
|
89 |
+
model_id = "EuroLLM/EuroLLM-1.7B"
|
90 |
+
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
91 |
+
model = AutoModelForCausalLM.from_pretrained(model_id)
|
92 |
+
|
93 |
+
text = "English: My name is EuroLLM. Portuguese:"
|
94 |
+
|
95 |
+
inputs = tokenizer(text, return_tensors="pt")
|
96 |
+
outputs = model.generate(**inputs, max_new_tokens=20)
|
97 |
+
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
98 |
+
|
99 |
+
|
100 |
+
## Results
|
101 |
+
|
102 |
+
### Machine Translation
|
103 |
+
|
104 |
+
We evaluate EuroLLM-1.7B-EuroBlocks-v0.1 on several machine translation benchmarks: FLORES-200, WMT-23, and WMT-24 comparing it with [Gemma-2B](https://huggingface.co/google/gemma-2b) and [Gemma-7B](https://huggingface.co/google/gemma-7b) (also instruction tuned on EuroBlocks-v0.1).
|
105 |
+
The results show that EuroLLM-1.7B is substantially better than Gemma-2B in Machine Translation and competitive with Gemma-7B.
|
106 |
+
|
107 |
+
#### Flores-200
|
108 |
+
| Model | AVG | AVG en-xx | AVG xx-en | en-ar | en-bg | en-ca | en-cs | en-da | en-de | en-el | en-es-latam | en-et | en-fi | en-fr | en-ga | en-gl | en-hi | en-hr | en-hu | en-it | en-ja | en-ko | en-lt | en-lv | en-mt | en-nl | en-no | en-pl | en-pt-br | en-ro | en-ru | en-sk | en-sl | en-sv | en-tr | en-uk | en-zh-cn | ar-en | bg-en | ca-en | cs-en | da-en | de-en | el-en | es-latam-en | et-en | fi-en | fr-en | ga-en | gl-en | hi-en | hr-en | hu-en | it-en | ja-en | ko-en | lt-en | lv-en | mt-en | nl-en | no-en | pl-en | pt-br-en | ro-en | ru-en | sk-en | sl-en | sv-en | tr-en | uk-en | zh-cn-en |
|
109 |
+
|--------------------------------|------|-----------|-----------|-------|-------|-------|-------|-------|-------|-------|--------------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|----------|-------|-------|-------|-------|-------|-------|-------|----------|-------|-------|-------|-------|-------|-------|-------|--------------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|----------|-------|-------|-------|-------|-------|-------|-------|----------|
|
110 |
+
| EuroLLM-1B-EuroBlocks-v0.1 | 86.10| 85.53 | 86.67 | 83.87 | 88.36 | 84.42 | 88.34 | 88.77 | 86.63 | 86.71 | 85.99 | 86.98 | 87.13 | 87.21 | 72.25 | 85.97 | 74.78 | 82.96 | 85.51 | 87.77 | 89.26 | 86.27 | 86.31 | 86.22 | 67.38 | 86.95 | 88.68 | 87.38 | 89.13 | 88.39 | 87.47 | 87.51 | 85.32 | 89.20 | 86.24 | 86.33 | 86.17 | 85.80 | 87.20 | 87.53 | 87.53 | 89.26 | 88.71 | 86.49 | 86.55 | 87.60 | 88.17 | 88.90 | 79.89 | 87.59 | 87.53 | 86.10 | 86.34 | 87.54 | 86.25 | 86.08 | 85.03 | 85.60 | 78.16 | 86.80 | 89.96 | 85.24 | 88.85 | 88.42 | 85.86 | 87.17 | 86.36 | 89.48 | 86.76 | 86.06 | 85.88 |
|
111 |
+
| Gemma-2B-EuroBlocks-v0.1 | 81.56| 78.93 | 84.18 | 75.25 | 82.46 | 83.17 | 82.17 | 84.40 | 83.20 | 79.63 | 84.15 | 72.63 | 81.00 | 85.12 | 38.79 | 82.00 | 67.00 | 81.18 | 78.24 | 84.80 | 87.08 | 82.04 | 73.02 | 68.41 | 56.67 | 83.30 | 86.69 | 83.07 | 86.82 | 84.00 | 84.55 | 77.93 | 76.19 | 80.77 | 79.76 | 84.19 | 84.10 | 83.67 | 85.73 | 86.89 | 86.38 | 88.39 | 88.11 | 84.68 | 86.11 | 83.45 | 86.45 | 88.22 | 50.88 | 86.44 | 85.87 | 85.33 | 85.16 | 86.75 | 85.62 | 85.00 | 81.55 | 81.45 | 67.90 | 85.95 | 89.05 | 84.18 | 88.27 | 87.38 | 85.13 | 85.22 | 83.86 | 87.83 | 84.96 | 85.15 | 85.10 |
|
112 |
+
| Gemma-7B-EuroBlocks-v0.1 | 86.16| 85.49 | 86.82 | 83.39 | 88.32 | 85.82 | 88.88 | 89.01 | 86.96 | 86.62 | 86.31 | 84.42 | 88.11 | 87.46 | 61.85 | 86.10 | 77.91 | 87.01 | 85.81 | 87.57 | 89.88 | 87.24 | 84.47 | 83.15 | 67.13 | 86.50 | 90.44 | 87.57 | 89.22 | 89.13 | 88.58 | 86.73 | 84.68 | 88.16 | 86.87 | 88.40 | 87.11 | 86.65 | 87.25 | 88.17 | 87.47 | 89.59 | 88.44 | 86.76 | 86.66 | 87.55 | 88.88 | 88.86 | 73.46 | 87.63 | 88.43 | 87.12 | 87.31 | 87.49 | 87.20 | 87.15 | 85.16 | 85.96 | 78.39 | 86.73 | 90.52 | 85.38 | 89.17 | 88.75 | 86.35 | 86.82 | 86.21 | 89.39 | 88.20 | 86.45 | 86.28 |
|
113 |
+
|
114 |
+
|
115 |
+
#### WMT-23
|
116 |
+
| Model | AVG | AVG en-xx | AVG xx-en | AVG xx-xx | en-de | en-cs | en-uk | en-ru | en-zh-cn | de-en | uk-en | ru-en | zh-cn-en | cs-uk |
|
117 |
+
|--------------------------------|------|-----------|-----------|-----------|-------|-------|-------|-------|----------|-------|-------|-------|----------|-------|
|
118 |
+
| EuroLLM-1.7B-EuroBlocks-v0.1 | 82.56| 82.30 | 82.07 | 85.81 | 80.99 | 84.42 | 80.74 | 81.94 | 83.42 | 83.74 | 85.06 | 81.00 | 78.49 | 85.81 |
|
119 |
+
| Gemma-2B-EuroBlocks-v0.1 | 79.86| 78.35 | 81.32 | 81.56 | 76.54 | 76.35 | 77.62 | 78.88 | 82.36 | 82.85 | 83.83 | 80.17 | 78.42 | 81.56 |
|
120 |
+
| Gemma-7B-EuroBlocks-v0.1 | 83.90| 83.70 | 83.21 | 87.61 | 82.15 | 84.68 | 83.05 | 83.85 | 84.79 | 84.40 | 85.86 | 82.55 | 80.01 | 87.61 |
|
121 |
+
|
122 |
+
|
123 |
+
#### WMT-24
|
124 |
+
| Model | AVG | AVG en-xx | AVG xx-xx | en-es-latam | en-cs | en-ru | en-uk | en-ja | en-zh-cn | en-hi | cs-uk | ja-zh-cn |
|
125 |
+
|---------|------|------|-------|-------|-------|-------|--------|--------|-------|-------|-------|-----|
|
126 |
+
| EuroLLM-1.7B-EuroBlocks-v0.1| 78.45|78.65|77.67|79.05|80.93|80.33|78.05|78.72|81.87|80.15|70.10|82.65|72.69|
|
127 |
+
|Gemma-2B-EuroBlocks-v0.1| 74.71|74.25|76.57|75.21|78.84|70.40|74.44|75.55|78.32|78.70|62.51|79.97|73.17|
|
128 |
+
|Gemma-7B-EuroBlocks-v0.1| 80.88|80.45|82.60|80.43|81.91|80.14|80.32|82.17|84.08|81.86|72.71|85.55|79.65|
|
129 |
+
|
130 |
+
### General Benchmarks
|
131 |
+
We also compare EuroLLM-1.7B with [TinyLlama-1.1-3T](https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T) and [Gemma-2B](https://huggingface.co/google/gemma-2b) on 3 general benchmarks: Arc Challenge, Hellaswag, and MMLU.
|
132 |
+
For the non-english languages we use the [Okapi](https://aclanthology.org/2023.emnlp-demo.28.pdf) datasets.
|
133 |
+
Results show that EuroLLM-1.7B is superior to TinyLlama-1.1-3T and similar to Gemma-2B on Hellaswag but worse on Arc Challenge and MMLU. This can be due to the lower number of parameters of EuroLLM-1.7B (1.133B non-embedding parameters against 1.981B).
|
134 |
+
|
135 |
+
#### Arc Challenge
|
136 |
+
| Model | Average | English | German | Spanish | French | Italian | Portuguese | Chinese | Russian | Dutch | Arabic | Swedish | Hindi | Hungarian | Romanian | Ukrainian | Danish | Catalan |
|
137 |
+
|--------------------|---------|---------|--------|---------|--------|---------|------------|---------|---------|-------|--------|---------|--------|-----------|----------|-----------|--------|---------|
|
138 |
+
| EuroLLM-1.7B | 0.3130 | 0.4215 | 0.3148 | 0.3376 | 0.3259 | 0.3396 | 0.3410 | 0.3068 | 0.2626 | 0.3037| 0.2652 | 0.3279 | 0.2688 | 0.3039 | 0.3085 | 0.2943 | 0.2956 | 0.3027 |
|
139 |
+
| TinyLlama-1.1-3T | 0.2621 | 0.3473 | 0.2541 | 0.2726 | 0.2797 | 0.2643 | 0.2829 | 0.2573 | 0.2421 | 0.2404| 0.2335 | 0.2661 | 0.2337 | 0.244 | 0.2536 | 0.2626 | 0.2476 | 0.2736 |
|
140 |
+
| Gemma-2B | 0.3617 | 0.4846 | 0.3755 | 0.3940 | 0.4080 | 0.3687 | 0.3872 | 0.3726 | 0.3456 | 0.3328| 0.3122 | 0.3519 | 0.2851 | 0.3039 | 0.3590 | 0.3601 | 0.3565 | 0.3516 |
|
141 |
+
|
142 |
+
#### Hellaswag
|
143 |
+
| Model | Average | English | German | Spanish | French | Italian | Portuguese | Russian | Dutch | Arabic | Swedish | Hindi | Hungarian | Romanian | Ukrainian | Danish | Catalan |
|
144 |
+
|--------------------|---------|---------|--------|---------|--------|---------|------------|---------|--------|--------|---------|--------|-----------|----------|-----------|--------|---------|
|
145 |
+
| EuroLLM-1.7B | 0.4653 | 0.6199 | 0.4653 | 0.5187 | 0.5173 | 0.5024 | 0.5116 | 0.4582 | 0.4821 | 0.3939 | 0.4722 | 0.3505 | 0.3970 | 0.4441 | 0.4224 | 0.4556 | 0.4329 |
|
146 |
+
| TinyLlama-1.1-3T | 0.3710 | 0.6027 | 0.3652 | 0.4136 | 0.4104 | 0.3780 | 0.4008 | 0.3544 | 0.3637 | 0.2981 | 0.3569 | 0.2904 | 0.3147 | 0.3337 | 0.3440 | 0.3464 | 0.3628 |
|
147 |
+
| Gemma-2B | 0.4666 | 0.7165 | 0.4756 | 0.5414 | 0.5180 | 0.4841 | 0.5081 | 0.4664 | 0.4655 | 0.3868 | 0.4383 | 0.3413 | 0.3710 | 0.4316 | 0.4291 | 0.4471 | 0.4448 |
|
148 |
+
|
149 |
+
#### MMLU
|
150 |
+
| Model | Average | English | German | Spanish | French | Italian | Portuguese | Chinese | Russian | Dutch | Arabic | Swedish | Hindi | Hungarian | Romanian | Ukrainian | Danish | Catalan |
|
151 |
+
|--------------------|---------|---------|--------|---------|--------|---------|------------|---------|---------|--------|--------|---------|--------|-----------|----------|-----------|--------|---------|
|
152 |
+
| EuroLLM-1.7B | 0.2631 | 0.2553 | 0.2626 | 0.2653 | 0.2589 | 0.2628 | 0.2634 | 0.2546 | 0.2626 | 0.2677 | 0.2608 | 0.2656 | 0.2690 | 0.2551 | 0.2677 | 0.2655 | 0.2675 | 0.2689 |
|
153 |
+
| TinyLlama-1.1-3T | 0.2546 | 0.2604 | 0.2498 | 0.2528 | 0.2535 | 0.2531 | 0.2511 | 0.2629 | 0.2541 | 0.2521 | 0.2591 | 0.2528 | 0.2550 | 0.2566 | 0.2548 | 0.2651 | 0.2419 | 0.2528 |
|
154 |
+
| Gemma-2B | 0.3356 | 0.4168 | 0.3519 | 0.3475 | 0.3463 | 0.3433 | 0.3383 | 0.3345 | 0.3261 | 0.3429 | 0.3158 | 0.3318 | 0.2842 | 0.3185 | 0.3243 | 0.3152 | 0.3377 | 0.3307 |
|
config.json
ADDED
@@ -0,0 +1,27 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"architectures": [
|
3 |
+
"LlamaForCausalLM"
|
4 |
+
],
|
5 |
+
"attention_bias": false,
|
6 |
+
"attention_dropout": 0.0,
|
7 |
+
"bos_token_id": 1,
|
8 |
+
"eos_token_id": 2,
|
9 |
+
"hidden_act": "silu",
|
10 |
+
"hidden_size": 2048,
|
11 |
+
"initializer_range": 0.02,
|
12 |
+
"intermediate_size": 5632,
|
13 |
+
"max_position_embeddings": 4096,
|
14 |
+
"model_type": "llama",
|
15 |
+
"num_attention_heads": 16,
|
16 |
+
"num_hidden_layers": 24,
|
17 |
+
"num_key_value_heads": 8,
|
18 |
+
"pretraining_tp": 1,
|
19 |
+
"rms_norm_eps": 1e-05,
|
20 |
+
"rope_scaling": null,
|
21 |
+
"rope_theta": 10000.0,
|
22 |
+
"tie_word_embeddings": false,
|
23 |
+
"torch_dtype": "bfloat16",
|
24 |
+
"transformers_version": "4.40.1",
|
25 |
+
"use_cache": true,
|
26 |
+
"vocab_size": 128000
|
27 |
+
}
|
generation_config.json
ADDED
@@ -0,0 +1,6 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"_from_model_config": true,
|
3 |
+
"bos_token_id": 1,
|
4 |
+
"eos_token_id": 2,
|
5 |
+
"transformers_version": "4.40.1"
|
6 |
+
}
|
pytorch_model.bin
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:6f0204cca6d6184a1f3262c412b3865199617f8f5e033b0bdabc89ead485f387
|
3 |
+
size 135
|
special_tokens_map.json
ADDED
@@ -0,0 +1,23 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"bos_token": {
|
3 |
+
"content": "<s>",
|
4 |
+
"lstrip": false,
|
5 |
+
"normalized": false,
|
6 |
+
"rstrip": false,
|
7 |
+
"single_word": false
|
8 |
+
},
|
9 |
+
"eos_token": {
|
10 |
+
"content": "</s>",
|
11 |
+
"lstrip": false,
|
12 |
+
"normalized": false,
|
13 |
+
"rstrip": false,
|
14 |
+
"single_word": false
|
15 |
+
},
|
16 |
+
"unk_token": {
|
17 |
+
"content": "<unk>",
|
18 |
+
"lstrip": false,
|
19 |
+
"normalized": false,
|
20 |
+
"rstrip": false,
|
21 |
+
"single_word": false
|
22 |
+
}
|
23 |
+
}
|
tokenizer.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|
tokenizer.model
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:62a79e1916e2ca26a1836a27da791516da5f68304eed89a18bf87b47741fd713
|
3 |
+
size 132
|
tokenizer_config.json
ADDED
@@ -0,0 +1,42 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"add_bos_token": true,
|
3 |
+
"add_eos_token": false,
|
4 |
+
"add_prefix_space": true,
|
5 |
+
"added_tokens_decoder": {
|
6 |
+
"0": {
|
7 |
+
"content": "<unk>",
|
8 |
+
"lstrip": false,
|
9 |
+
"normalized": false,
|
10 |
+
"rstrip": false,
|
11 |
+
"single_word": false,
|
12 |
+
"special": true
|
13 |
+
},
|
14 |
+
"1": {
|
15 |
+
"content": "<s>",
|
16 |
+
"lstrip": false,
|
17 |
+
"normalized": false,
|
18 |
+
"rstrip": false,
|
19 |
+
"single_word": false,
|
20 |
+
"special": true
|
21 |
+
},
|
22 |
+
"2": {
|
23 |
+
"content": "</s>",
|
24 |
+
"lstrip": false,
|
25 |
+
"normalized": false,
|
26 |
+
"rstrip": false,
|
27 |
+
"single_word": false,
|
28 |
+
"special": true
|
29 |
+
}
|
30 |
+
},
|
31 |
+
"bos_token": "<s>",
|
32 |
+
"clean_up_tokenization_spaces": false,
|
33 |
+
"eos_token": "</s>",
|
34 |
+
"legacy": true,
|
35 |
+
"model_max_length": 1000000000000000019884624838656,
|
36 |
+
"pad_token": null,
|
37 |
+
"sp_model_kwargs": {},
|
38 |
+
"spaces_between_special_tokens": false,
|
39 |
+
"tokenizer_class": "LlamaTokenizer",
|
40 |
+
"unk_token": "<unk>",
|
41 |
+
"use_default_system_prompt": false
|
42 |
+
}
|