OSainz commited on
Commit
7591d6d
1 Parent(s): d955103

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +157 -331
README.md CHANGED
@@ -1,7 +1,7 @@
1
  ---
2
  license: llama2
3
  datasets:
4
- - HiTZ/euscrawl
5
  language:
6
  - eu
7
  - en
@@ -10,21 +10,111 @@ metrics:
10
  - f1
11
  - perplexity
12
  pipeline_tag: text-generation
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
  ---
14
 
15
  # **Model Card for Latxa 70b**
16
 
17
  ![Latxa](latxa.jpeg)
18
 
19
- Latxa is a collection of foundation models specifically tuned for Basque. Based on Meta’s LLaMA 2 model family, these models were further trained with Euscrawl, a highly curated Basque corpora ([Artetxe et al., 2022](https://aclanthology.org/2022.emnlp-main.499/)). Ranging from 7 billion to 70 billion parameters, these models are currently the biggest and best-performing LLMs built for Basque. This is the 70b repository, links to other models can be found in the [Latxa Collection](https://huggingface.co/collections/HiTZ/latxa-65a697e6838b3acc53677304).
20
 
 
 
 
21
 
22
  # **Model Details**
23
 
24
 
25
  ## **Model Description**
26
 
27
- Latxa is a family of Large Language Models (LLM) based on Meta’s [LLaMA models](https://huggingface.co/meta-llama). Current LLMs exhibit incredible performance for high-resource languages such as English, but, in the case of Basque and other low-resource languages, their performance is close to a random guesser. These limitations widen the gap between high- and low-resource languages when it comes to digital development. We present Latxa to overcome these limitations and promote the development of LLM-based technology and research for the Basque language. Latxa models follow the same architecture as their original counterparts and were further trained in Euscrawl v1 ([Artetxe et al., 2022](https://aclanthology.org/2022.emnlp-main.499/)), a high-quality Basque corpora.
28
 
29
  The models are released in three sizes: 7B, 13B and 70B.
30
 
@@ -46,7 +136,7 @@ Use the code below to get started with the model.
46
 
47
  from transformers import pipeline
48
 
49
- pipe = pipeline("text-generation", model=”HiTZ/latxa-70b-v1)
50
 
51
  text = "Euskara adimen artifizialera iritsi da!"
52
 
@@ -79,9 +169,9 @@ The model was not fine-tuned to follow instructions or to work as a chat assista
79
 
80
  # **Bias, Risks, and Limitations**
81
 
82
- In an effort to alleviate the potentially disturbing or harmful content, Latxa has been trained on carefully selected and processed data which comes mainly from local media, national/regional newspapers, encyclopedias and blogs (see Euscrawl below). Still, the model is based on LLaMA models and can potentially carry the same bias, risk and limitations.
83
 
84
- Please see the LLaMA’s _Ethical Considerations and Limitations _for further information.
85
 
86
 
87
  # **Training Details**
@@ -89,109 +179,17 @@ Please see the LLaMA’s _Ethical Considerations and Limitations _for further in
89
 
90
  ## **Training Data**
91
 
92
- The models were trained on EusCrawl v1, a high-quality corpus for Basque comprising 1.72M documents, 288M words, totalling 2.1GiB of uncompressed text. EusCrawl was built using ad-hoc scrapers to extract text from 33 Basque websites with high-quality content, resulting in cleaner text compared to general-purpose approaches.
93
 
94
- See more details in the [EusCrawl](https://huggingface.co/datasets/HiTZ/euscrawl) dataset card.
95
 
96
- Additionally, 100K documents of English data randomly selected from the [Pile](https://huggingface.co/datasets/EleutherAI/pile) dataset were also included to avoid catastrophic forgetting.
97
 
98
 
99
  ## **Training Procedure**
100
 
101
- The models were trained using the GPT-Neox library on the HPC CINECA computing cluster. All the models were approximately trained with an effective batch size of 2M tokens for 1000 to 2000 steps.
102
-
103
-
104
- <table>
105
- <tr>
106
- <td>Model
107
- </td>
108
- <td>Steps
109
- </td>
110
- <td>Sequence length
111
- </td>
112
- <td>Effective Batch size
113
- </td>
114
- <td>Total tokens
115
- </td>
116
- <td>GPU hours
117
- </td>
118
- </tr>
119
- <tr>
120
- <td>Latxa 7B
121
- </td>
122
- <td><p style="text-align: right">
123
- 2000</p>
124
-
125
- </td>
126
- <td><p style="text-align: right">
127
- 4096</p>
128
-
129
- </td>
130
- <td><p style="text-align: right">
131
- 2M tokens/step</p>
132
-
133
- </td>
134
- <td><p style="text-align: right">
135
- 4B</p>
136
-
137
- </td>
138
- <td><p style="text-align: right">
139
- 359.2h</p>
140
-
141
- </td>
142
- </tr>
143
- <tr>
144
- <td>Latxa 13B
145
- </td>
146
- <td><p style="text-align: right">
147
- 1000</p>
148
-
149
- </td>
150
- <td><p style="text-align: right">
151
- 4096</p>
152
-
153
- </td>
154
- <td><p style="text-align: right">
155
- 2M tokens/step</p>
156
-
157
- </td>
158
- <td><p style="text-align: right">
159
- 2B</p>
160
-
161
- </td>
162
- <td><p style="text-align: right">
163
- 468.8h</p>
164
-
165
- </td>
166
- </tr>
167
- <tr>
168
- <td>Latxa 70B
169
- </td>
170
- <td><p style="text-align: right">
171
- 1680</p>
172
-
173
- </td>
174
- <td><p style="text-align: right">
175
- 4096</p>
176
-
177
- </td>
178
- <td><p style="text-align: right">
179
- 2M tokens/step</p>
180
-
181
- </td>
182
- <td><p style="text-align: right">
183
- 3.4B</p>
184
-
185
- </td>
186
- <td><p style="text-align: right">
187
- *6475.52h</p>
188
-
189
- </td>
190
- </tr>
191
- </table>
192
-
193
-
194
- * indicates the time for the entire training process (2000 steps), however the weights of the step 1680 are shared as it is the best checkpoint according to validation loss.
195
 
196
 
197
  # **Evaluation**
@@ -219,13 +217,19 @@ We evaluated the models on zero-shot and few-shot settings on generative, multip
219
  * **EpecKorrefBin**: Correference detection task similar to WSC.
220
  * **QNLIeu**: Q&A NLI built from the Basque Wikipedia.
221
  * **WiCeu**: Basque Word-in-Context task.
222
-
 
 
 
 
 
 
 
223
 
224
  ### **Metrics**
225
 
 
226
 
227
-
228
- * **Accuracy**: Belebele, X-StoryCloze, EpecKorrefBin, QNLI-eu, and, WiC-eu
229
  * **Micro F1**: BEC2016-eu and BHTCv2
230
  * **Macro F1**: VaxxStance (favor & against)
231
 
@@ -235,244 +239,66 @@ We evaluated the models on zero-shot and few-shot settings on generative, multip
235
  The model was evaluated using the LM Evaluation harness library from Eleuther AI. In order to reproduce our results please refer to our [fork](https://github.com/naiarapm/lm-evaluation-harness/tree/basqueglue) that includes the implementation for the mentioned datasets.
236
 
237
 
238
- <table>
239
- <tr>
240
- <td><strong>Model</strong>
241
- </td>
242
- <td><strong>Belebele</strong>
243
- </td>
244
- <td><strong>X-StoryCloze</strong>
245
- </td>
246
- <td><strong>BEC</strong>
247
- </td>
248
- <td><strong>Vaxx</strong>
249
- </td>
250
- <td><strong>BHTC</strong>
251
- </td>
252
- <td><strong>coref</strong>
253
- </td>
254
- <td><strong>QNLI</strong>
255
- </td>
256
- <td><strong>WiC</strong>
257
- </td>
258
- <td><strong>Average</strong>
259
- </td>
260
- </tr>
261
- <tr>
262
- <td>Random
263
- </td>
264
- <td>25.00
265
- </td>
266
- <td>50.00
267
- </td>
268
- <td>33.33
269
- </td>
270
- <td>33.33
271
- </td>
272
- <td>8.33
273
- </td>
274
- <td>50.00
275
- </td>
276
- <td>50.00
277
- </td>
278
- <td>50.00
279
- </td>
280
- <td>37.50
281
- </td>
282
- </tr>
283
- <tr>
284
- <td>LLaMA 2 7B
285
- </td>
286
- <td>26.22
287
- </td>
288
- <td>50.43
289
- </td>
290
- <td>41.63
291
- </td>
292
- <td>18.60
293
- </td>
294
- <td>20.06
295
- </td>
296
- <td>50.94
297
- </td>
298
- <td>48.32
299
- </td>
300
- <td>49.64
301
- </td>
302
- <td>38.23
303
- </td>
304
- </tr>
305
- <tr>
306
- <td>LLaMA 2 13B
307
- </td>
308
- <td>32.00
309
- </td>
310
- <td>50.63
311
- </td>
312
- <td>41.09
313
- </td>
314
- <td>18.25
315
- </td>
316
- <td>27.35
317
- </td>
318
- <td>49.23
319
- </td>
320
- <td>48.74
321
- </td>
322
- <td>49.21
323
- </td>
324
- <td>39.56
325
- </td>
326
- </tr>
327
- <tr>
328
- <td>LLaMA 2 70B
329
- </td>
330
- <td>33.56
331
- </td>
332
- <td>51.62
333
- </td>
334
- <td>47.47
335
- </td>
336
- <td>21.01
337
- </td>
338
- <td>31.01
339
- </td>
340
- <td>52.98
341
- </td>
342
- <td>51.26
343
- </td>
344
- <td>51.57
345
- </td>
346
- <td>42.56
347
- </td>
348
- </tr>
349
- <tr>
350
- <td>BLOOM 7B
351
- </td>
352
- <td>27.00
353
- </td>
354
- <td>57.18
355
- </td>
356
- <td>37.94
357
- </td>
358
- <td>20.72
359
- </td>
360
- <td>39.10
361
- </td>
362
- <td>48.21
363
- </td>
364
- <td>47.48
365
- </td>
366
- <td>47.57
367
- </td>
368
- <td>40.65
369
- </td>
370
- </tr>
371
- <tr>
372
- <td>XGLM 7B
373
- </td>
374
- <td>23.88
375
- </td>
376
- <td>57.71
377
- </td>
378
- <td>39.94
379
- </td>
380
- <td>21.58
381
- </td>
382
- <td>36.73
383
- </td>
384
- <td>50.94
385
- </td>
386
- <td>50.42
387
- </td>
388
- <td>49.21
389
- </td>
390
- <td>41.30
391
- </td>
392
- </tr>
393
- <tr>
394
- <td><strong>Latxa 7B</strong>
395
- </td>
396
- <td>35.67
397
- </td>
398
- <td>63.13
399
- </td>
400
- <td>55.61
401
- </td>
402
- <td>45.93
403
- </td>
404
- <td>44.44
405
- </td>
406
- <td>50.43
407
- </td>
408
- <td>55.04
409
- </td>
410
- <td>50.14
411
- </td>
412
- <td>50.05
413
- </td>
414
- </tr>
415
- <tr>
416
- <td><strong>Latxa 13B</strong>
417
- </td>
418
- <td>53.56
419
- </td>
420
- <td>65.85
421
- </td>
422
- <td>53.23
423
- </td>
424
- <td>48.66
425
- </td>
426
- <td><strong>53.61</strong>
427
- </td>
428
- <td>62.52
429
- </td>
430
- <td>57.14
431
- </td>
432
- <td>54.21
433
- </td>
434
- <td>56.10
435
- </td>
436
- </tr>
437
- <tr>
438
- <td><strong>Latxa 70B</strong>
439
- </td>
440
- <td><strong>71.78</strong>
441
- </td>
442
- <td><strong>67.57</strong>
443
- </td>
444
- <td><strong>63.52</strong>
445
- </td>
446
- <td><strong>48.95</strong>
447
- </td>
448
- <td>49.51
449
- </td>
450
- <td><strong>79.90</strong>
451
- </td>
452
- <td><strong>58.82</strong>
453
- </td>
454
- <td><strong>55.50</strong>
455
- </td>
456
- <td><strong>61.94</strong>
457
- </td>
458
- </tr>
459
- </table>
460
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
461
 
462
 
463
  # **Environmental Impact**
464
 
465
  Carbon emissions are estimated using the[ Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in[ Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
466
 
 
 
 
 
 
 
467
 
468
 
469
  * **Hardware Type:** HPC Cluster, 4x A100 64Gb nodes
470
- * **Hours used:** 359.2h + 468.8h + 6475.52h = 7303.52h
471
  * **Compute cluster:** CINECA HPC
472
  * **Compute Region:** Italy
473
- * **Carbon Emitted:** 673.75kg CO<sub>2</sub> eq
474
 
475
 
476
  # **Acknowledgements**
477
 
478
- This work has been partially supported by the Basque Government (IKER-GAITU project). The models were trained on the Leonardo supercomputer at CINECA under the EuroHPC Joint Undertaking, project EHPC-EXT-2023E01-013.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: llama2
3
  datasets:
4
+ - HiTZ/latxa-corpus-v1.1
5
  language:
6
  - eu
7
  - en
 
10
  - f1
11
  - perplexity
12
  pipeline_tag: text-generation
13
+ model-index:
14
+ - name: Latxa-70b-v1.1
15
+ results:
16
+ - task:
17
+ type: multiple-choice
18
+ dataset:
19
+ name: xstory_cloze
20
+ type: XStory
21
+ metrics:
22
+ - name: Accuracy (0-shot)
23
+ type: Accuracy (0-shot)
24
+ value: 69.76
25
+ source:
26
+ name: Paper
27
+ url: https://paper-url.com
28
+ - task:
29
+ type: multiple-choice
30
+ dataset:
31
+ name: belebele
32
+ type: Belebele
33
+ metrics:
34
+ - name: Accuracy (5-shot)
35
+ type: Accuracy (5-shot)
36
+ value: 64.89
37
+ source:
38
+ name: Paper
39
+ url: https://paper-url.com
40
+ - task:
41
+ type: mix
42
+ dataset:
43
+ name: basque_glue
44
+ type: BasqueGLUE
45
+ metrics:
46
+ - name: Average scores (5-shot)
47
+ type: Average scores (5-shot)
48
+ value: 61.66
49
+ source:
50
+ name: Paper
51
+ url: https://paper-url.com
52
+ - task:
53
+ type: multiple_choice
54
+ dataset:
55
+ name: eus_proficiency
56
+ type: EusProficiency
57
+ metrics:
58
+ - name: Accuracy (5-shot)
59
+ type: Accuracy (5-shot)
60
+ value: 60.61
61
+ source:
62
+ name: Paper
63
+ url: https://paper-url.com
64
+ - task:
65
+ type: multiple_choice
66
+ dataset:
67
+ name: eus_reading
68
+ type: EusReading
69
+ metrics:
70
+ - name: Accuracy (5-shot)
71
+ type: Accuracy (5-shot)
72
+ value: 53.69
73
+ source:
74
+ name: Paper
75
+ url: https://paper-url.com
76
+ - task:
77
+ type: multiple_choice
78
+ dataset:
79
+ name: eus_trivia
80
+ type: EusTrivia
81
+ metrics:
82
+ - name: Accuracy (5-shot)
83
+ type: Accuracy (5-shot)
84
+ value: 61.52
85
+ source:
86
+ name: Paper
87
+ url: https://paper-url.com
88
+ - task:
89
+ type: multiple_choice
90
+ dataset:
91
+ name: eus_exams
92
+ type: EusExams
93
+ metrics:
94
+ - name: Accuracy (5-shot)
95
+ type: Accuracy (5-shot)
96
+ value: 54.48
97
+ source:
98
+ name: Paper
99
+ url: https://paper-url.com
100
  ---
101
 
102
  # **Model Card for Latxa 70b**
103
 
104
  ![Latxa](latxa.jpeg)
105
 
106
+ We introduce Latxa, a family of large language models for Basque ranging from 7 to 70 billion parameters. Latxa is based on Llama 2, which we continue pretraining on a new Basque corpus comprising 4.3M documents and 4.2B tokens. In our extensive evaluation, Latxa outperforms all previous open models we compare to by a large margin. In addition, it is competitive with GPT-4 Turbo in language proficiency and understanding, despite lagging behind in reading comprehension and knowledgeintensive tasks. Both the Latxa family of models, as well as our new pretraining corpora and evaluation datasets, are publicly available under open licenses. Our suite enables reproducible research on methods to build LLMs for low-resource languages
107
 
108
+ - 📒 Blog Post: [Latxa: An Open Language Model and Evaluation Suite for Basque](https://www.hitz.eus/en/node/340)
109
+ - 📖 Paper: [Latxa: An Open Language Model and Evaluation Suite for Basque](https://openreview.net/forum?id=mMqOvfqFS9)
110
+ - 💻 Code: [hitz-zentroa/latxa](https://github.com/hitz-zentroa/latxa)
111
 
112
  # **Model Details**
113
 
114
 
115
  ## **Model Description**
116
 
117
+ Latxa is a family of Large Language Models (LLM) based on Meta’s [LLaMA models](https://huggingface.co/meta-llama). Current LLMs exhibit incredible performance for high-resource languages such as English, but, in the case of Basque and other low-resource languages, their performance is close to a random guesser. These limitations widen the gap between high- and low-resource languages when it comes to digital development. We present Latxa to overcome these limitations and promote the development of LLM-based technology and research for the Basque language. Latxa models follow the same architecture as their original counterparts and were further trained in [Latxa Corpus v1.1](https://huggingface.co/datasets/HiTZ/latxa-corpus-v1.1), a high-quality Basque corpora.
118
 
119
  The models are released in three sizes: 7B, 13B and 70B.
120
 
 
136
 
137
  from transformers import pipeline
138
 
139
+ pipe = pipeline("text-generation", model="HiTZ/latxa-70b-v1.1")
140
 
141
  text = "Euskara adimen artifizialera iritsi da!"
142
 
 
169
 
170
  # **Bias, Risks, and Limitations**
171
 
172
+ In an effort to alleviate the potentially disturbing or harmful content, Latxa has been trained on carefully selected and processed data which comes mainly from local media, national/regional newspapers, encyclopedias and blogs (see Latxa-Corpus below). Still, the model is based on LLaMA models and can potentially carry the same bias, risk and limitations.
173
 
174
+ Please see the LLaMA’s _Ethical Considerations and Limitations_ for further information.
175
 
176
 
177
  # **Training Details**
 
179
 
180
  ## **Training Data**
181
 
182
+ Our training corpus combines various existing datasets, as well as some new ones that we release with this work. We have prioritized quality over quantity when constructing our corpus, prioritizing high-quality data sources and applying a thorough deduplication and filtering process. In total, a 4.17B tokens corpus is used to train the model.
183
 
184
+ See more details in the [Latxa Corpus](https://huggingface.co/datasets/HiTZ/latxa-corpus-v1.1) dataset card.
185
 
186
+ Additionally, 500K documents of English data randomly selected from the [Pile](https://huggingface.co/datasets/EleutherAI/pile) dataset were also included to avoid catastrophic forgetting.
187
 
188
 
189
  ## **Training Procedure**
190
 
191
+ The training of Latxa was conducted using the [GPT-Neox](https://github.com/EleutherAI/gpt-neox) library. As infrastructure, we leveraged the CINECA HPC Leonardo computing cluster located in Italy, which is powered by 3456 nodes each containing 4x custom A100 64Gb GPUs. The models were trained for 10k steps with a sequence length of 4096 tokens and an effective batch size of 2M tokens, resulting in a total of 20B tokens (around 4 epochs). We used a cosine learning rate schedule, with a warm-up of 500 steps and decaying down to 3\% of the peak learning rate. We set up the peak learning rate to be 1e-4. All other hyperparameters follow ([Touvron et al., 2023](https://arxiv.org/abs/2307.09288)).
192
+
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
193
 
194
 
195
  # **Evaluation**
 
217
  * **EpecKorrefBin**: Correference detection task similar to WSC.
218
  * **QNLIeu**: Q&A NLI built from the Basque Wikipedia.
219
  * **WiCeu**: Basque Word-in-Context task.
220
+ * **EusProficiency** ([Etxaniz et al., 2024]()): EusProficiency comprises 5,169 exercises on different topics from past EGA exams, the official C1-level certificate of proficiency in Basque.
221
+ * Data card: [https://huggingface.co/datasets/HiTZ/EusProficiency](https://huggingface.co/datasets/HiTZ/EusProficiency)
222
+ * **EusReading** ([Etxaniz et al., 2024]()): EusReading consists of 352 reading comprehension exercises (_irakurmena_) sourced from the same set of past EGA exams.
223
+ * Data card: [https://huggingface.co/datasets/HiTZ/EusReading](https://huggingface.co/datasets/HiTZ/EusReading)
224
+ * **EusTrivia** ([Etxaniz et al., 2024]()): EusTrivia consists of 1,715 trivia questions from multiple online sources. 56.3\% of the questions are elementary level (grades 3-6), while the rest are considered challenging.
225
+ * Data card: [https://huggingface.co/datasets/HiTZ/EusTrivia](https://huggingface.co/datasets/HiTZ/EusTrivia)
226
+ * **EusExams** ([Etxaniz et al., 2024]()): EusExams is a collection of tests designed to prepare individuals for Public Service examinations conducted by several Basque institutions, including the public health system Osakidetza, the Basque Government, the City Councils of Bilbao and Gasteiz, and the University of the Basque Country (UPV/EHU).
227
+ * Data card: [https://huggingface.co/datasets/HiTZ/EusExams](https://huggingface.co/datasets/HiTZ/EusExams)
228
 
229
  ### **Metrics**
230
 
231
+ For most of the task we used Accuracy, as they are framed as Multiple Choice questions. For the rest, particularly task from BasqueGLUE benchmark, we have used the following:
232
 
 
 
233
  * **Micro F1**: BEC2016-eu and BHTCv2
234
  * **Macro F1**: VaxxStance (favor & against)
235
 
 
239
  The model was evaluated using the LM Evaluation harness library from Eleuther AI. In order to reproduce our results please refer to our [fork](https://github.com/naiarapm/lm-evaluation-harness/tree/basqueglue) that includes the implementation for the mentioned datasets.
240
 
241
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
242
 
243
+ | Model | Size | XStory | Belebele | BasGLUE | EusProf | EusRead | EusTrivia | EusExams | Avg |
244
+ |------------------|------|--------|----------|---------|---------|---------|-----------|----------|-------|
245
+ | **Random** | | 50.00 | 25.00 | 37.50 | 25.00 | 25.83 | 26.55 | 25.00 | 30.70 |
246
+ |
247
+ | GPT 3.5 Turbo | n/a | -- | 57.33 | 48.62 | 31.24 | 36.65 | 46.71 | 42.42 | -- |
248
+ | GPT 4 Turbo | n/a | -- | **90.67**| **62.90**| **56.70**| **75.85**| **73.12** | **70.22**| -- |
249
+ |
250
+ | XGLM | 7B | 57.71 | 23.88 | 41.47 | 22.96 | 24.43 | 26.53 | 24.59 | 32.51 |
251
+ | BLOOM | 7B | 57.18 | 27.00 | 40.17 | 25.34 | 28.41 | 27.17 | 25.07 | 33.86 |
252
+ | Mistral | 7B | 51.09 | **38.89**| 39.22 | 25.01 | 29.26 | 34.58 | 32.15 | 35.94 |
253
+ | Llama 2 | 7B | 50.43 | 26.22 | 38.20 | 24.09 | 27.27 | 29.50 | 28.84 | 32.51 |
254
+ | **Latxa v1** | 7B | 63.13 | 35.67 | 50.26 | 28.19 | 27.27 | 40.17 | 34.18 | 39.84 |
255
+ | **Latxa v1.1** | 7B | **65.72**| 36.89 | **51.78**| **32.44**| **30.40**| **44.37** | **34.20**| **42.26** |
256
+ |
257
+ | mGPT | 13B | 55.39 | 25.00 | 37.56 | 25.00 | 24.15 | 27.17 | 25.73 | 32.14 |
258
+ | Llama 2 | 13B | 50.63 | 32.00 | 38.98 | 25.90 | 28.98 | 33.53 | 29.66 | 34.36 |
259
+ | **Latxa v1** | 13B | 65.85 | **53.56** | **54.49** | 41.19 | **40.06**| 51.14 | 42.92 | **49.95** |
260
+ | **Latxa v1.1** | 13B | **67.24**| 51.56 | 54.04 | **45.02**| 29.83 | **56.44** | **43.18**| 49.62 |
261
+ |
262
+ | Mixtral | 8x7B | 52.55 | 50.44 | 45.00 | 26.43 | 37.50 | 42.51 | 39.87 | 41.97 |
263
+ | Yi | 34B | 52.22 | 54.56 | 43.90 | 27.30 | 34.66 | 42.57 | 39.68 | 42.05 |
264
+ | Llama 2 | 70B | 51.62 | 33.56 | 42.55 | 24.16 | 27.84 | 38.43 | 33.08 | 35.47 |
265
+ | **Latxa v1** | 70B | 67.57 | **71.78** | 59.37 | 48.19 | 49.72 | 57.84 | 51.68 | 58.02 |
266
+ | **Latxa v1.1** | 70B | **69.76**| 64.89| **61.66**| **60.61**| **53.69**| **61.52** | **54.48**| **60.94** |
267
 
268
 
269
  # **Environmental Impact**
270
 
271
  Carbon emissions are estimated using the[ Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in[ Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
272
 
273
+ | Model | Size | Time (GPU Hours) | Carbon Emitted (kg CO2 eq) |
274
+ |------------|------|-------------------|----------------------------|
275
+ | Latxa v1.1 | 7B | 1,895.4h | 247.69kg |
276
+ | Latxa v1.1 | 13B | 2,518.0h | 329.06kg |
277
+ | Latxa v1.1 | 70B | 30,266.0h | 3,955.17kg |
278
+ | Total | - | 34,679.4h | 4,531.92kg |
279
 
280
 
281
  * **Hardware Type:** HPC Cluster, 4x A100 64Gb nodes
282
+ * **Hours used:** 34,679.4h
283
  * **Compute cluster:** CINECA HPC
284
  * **Compute Region:** Italy
285
+ * **Carbon Emitted:** 4,531.92kg CO<sub>2</sub> eq
286
 
287
 
288
  # **Acknowledgements**
289
 
290
+ This work has been partially supported by the Basque Government (IKER-GAITU project). The models were trained on the Leonardo supercomputer at CINECA under the EuroHPC Joint Undertaking, project EHPC-EXT-2023E01-013.
291
+
292
+ # **Citation**
293
+ To cite our work, please use:
294
+ ```bibtex
295
+ @misc{etxaniz2024latxa,
296
+ title={{L}atxa: An Open Language Model and Evaluation Suite for {B}asque},
297
+ author={Julen Etxaniz and Oscar Sainz and Naiara Perez and Itziar Aldabe and German Rigau and Eneko Agirre and Aitor Ormazabal and Mikel Artetxe and Aitor Soroa},
298
+ year={2024},
299
+ eprint={},
300
+ archivePrefix={arXiv},
301
+ primaryClass={cs.CL}
302
+ }
303
+
304
+ ```