v1.1
Browse files- .gitattributes +3 -0
- README.md +42 -28
- config.json +2 -2
- model.safetensors +1 -1
.gitattributes
CHANGED
@@ -34,3 +34,6 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
36 |
images/salamandra_header.png filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
|
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
36 |
images/salamandra_header.png filter=lfs diff=lfs merge=lfs -text
|
37 |
+
model.safetensors filter=lfs diff=lfs merge=lfs -text
|
38 |
+
tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
39 |
+
tokenizer.model filter=lfs diff=lfs merge=lfs -text
|
README.md
CHANGED
@@ -146,7 +146,7 @@ The accelerated partition is composed of 1,120 nodes with the following specific
|
|
146 |
The instruction-following models use the commonly adopted ChatML template:
|
147 |
|
148 |
```jinja
|
149 |
-
{%- if
|
150 |
```
|
151 |
Where `system_message` is used to guide the model during generation and `date_string` can be set to allow the model to respond with the current date.
|
152 |
|
@@ -607,31 +607,34 @@ The dataset does not allow for external contributions.
|
|
607 |
|
608 |
### Finetuning Data
|
609 |
|
610 |
-
This
|
611 |
-
|
612 |
-
|
613 |
-
|
614 |
-
|
|
615 |
-
|
|
616 |
-
|
|
617 |
-
| dolly-
|
618 |
-
|
|
619 |
-
|
|
620 |
-
|
|
621 |
-
|
|
622 |
-
|
|
623 |
-
|
|
624 |
-
|
|
625 |
-
|
|
626 |
-
|
|
627 |
-
|
|
|
|
628 |
|
629 |
---
|
630 |
|
|
|
631 |
## Evaluation
|
632 |
|
633 |
### Gold-standard benchmarks
|
634 |
-
|
|
|
635 |
Evaluation is done using the Language Model Evaluation Harness (Gao et al., 2024). We evaluate on a set of tasks taken from [SpanishBench](https://github.com/EleutherAI/lm-evaluation-harness/tree/main/lm_eval/tasks/spanish_bench), [CatalanBench](https://github.com/EleutherAI/lm-evaluation-harness/tree/main/lm_eval/tasks/catalan_bench), [BasqueBench](https://github.com/EleutherAI/lm-evaluation-harness/tree/main/lm_eval/tasks/basque_bench) and [GalicianBench](https://github.com/EleutherAI/lm-evaluation-harness/tree/main/lm_eval/tasks/galician_bench). These benchmarks include both new and existing tasks and datasets. Given that this is an instructed model, we add LM Evaluation Harness's native feature of `chat-template` to the setup. In the tables below, we include the results in a selection of evaluation datasets that represent model's performance across a variety of tasks within these benchmarks.
|
636 |
|
637 |
We only use tasks that are either human generated, human translated, or with a strong human-in-the-loop (i.e., machine translation followed by professional revision or machine generation followed by human revision and annotation). This is the reason behind the variety in number of tasks reported across languages. As more tasks that fulfill these requirements are published, we will update the presented results. We also intend to expand the evaluation to other languages, as long as the datasets meet our quality standards.
|
@@ -866,6 +869,7 @@ All results reported below are on a 0-shot setting.
|
|
866 |
</tr>
|
867 |
</tbody>
|
868 |
</table>
|
|
|
869 |
|
870 |
### LLM-as-a-judge
|
871 |
|
@@ -1087,13 +1091,23 @@ Further details on all tasks and criteria, a full list of results compared to ot
|
|
1087 |
|
1088 |
## Ethical Considerations and Limitations
|
1089 |
|
1090 |
-
We examine the presence of undesired societal and cognitive biases present in this model using different benchmarks. For societal biases,
|
|
|
|
|
|
|
1091 |
|
1092 |
-
Our cognitive bias analysis focuses on positional effects in 0-shot settings, and majority class bias in few-shot settings.
|
|
|
|
|
|
|
|
|
1093 |
|
1094 |
-
We highlight that our analyses of these biases are by no means exhaustive and are limited by the relative scarcity of adequate resources
|
1095 |
-
|
1096 |
-
|
|
|
|
|
|
|
1097 |
|
1098 |
---
|
1099 |
|
@@ -1120,7 +1134,7 @@ This project has benefited from the contributions of numerous teams and institut
|
|
1120 |
|
1121 |
In Catalonia, many institutions have been involved in the project. Our thanks to Òmnium Cultural, Parlament de Catalunya, Institut d'Estudis Aranesos, Racó Català, Vilaweb, ACN, Nació Digital, El món and Aquí Berguedà.
|
1122 |
|
1123 |
-
At national level, we are especially grateful to our ILENIA project partners: CENID, HiTZ and CiTIUS for their participation. We also extend our genuine gratitude to the Spanish Senate and Congress, Fundación Dialnet, Fundación Elcano and the ‘Instituto Universitario de Sistemas Inteligentes y Aplicaciones Numéricas en Ingeniería (SIANI)’ of the University of Las Palmas de Gran Canaria.
|
1124 |
|
1125 |
At the international level, we thank the Welsh government, DFKI, Occiglot project, especially Malte Ostendorff, and The Common Crawl Foundation, especially Pedro Ortiz, for their collaboration. We would also like to give special thanks to the NVIDIA team, with whom we have met regularly, specially to: Ignacio Sarasua, Adam Henryk Grzywaczewski, Oleg Sudakov, Sergio Perez, Miguel Martinez, Felipes Soares and Meriem Bendris. Their constant support has been especially appreciated throughout the entire process.
|
1126 |
|
@@ -1136,7 +1150,7 @@ The Barcelona Supercomputing Center, as the owner and creator of the model, shal
|
|
1136 |
|
1137 |
### Citation
|
1138 |
|
1139 |
-
Technical report
|
1140 |
|
1141 |
### License
|
1142 |
[Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0)
|
@@ -1146,4 +1160,4 @@ Technical report and paper coming soon.
|
|
1146 |
|:---:|:---:|:---:|
|
1147 |
|2B| [Link](https://huggingface.co/BSC-LT/salamandra-2b) | [Link](https://huggingface.co/BSC-LT/salamandra-2b-instruct) |
|
1148 |
|7B| [Link](https://huggingface.co/BSC-LT/salamandra-7b) | [Link](https://huggingface.co/BSC-LT/salamandra-7b-instruct) |
|
1149 |
-
|40B| [Link](https://huggingface.co/BSC-LT/ALIA-40b) | WiP |
|
|
|
146 |
The instruction-following models use the commonly adopted ChatML template:
|
147 |
|
148 |
```jinja
|
149 |
+
{%- if messages[0]['role'] == 'system' %}{%- set system_message = messages[0]['content'] %}{%- set loop_messages = messages[1:] %}{%- else %}{%- set system_message = 'SYSTEM MESSAGE' %}{%- set loop_messages = messages %}{%- endif %}{%- if not date_string is defined %}{%- set date_string = '2024-09-30' %}{%- endif %}{{ '<|im_start|>system\n' + system_message + '<|im_end|>\n' }}{% for message in loop_messages %}{%- if (message['role'] != 'user') and (message['role'] != 'assistant')%}{{ raise_exception('Only user and assitant roles are suported after the initial optional system message.') }}{% endif %}{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}{{ raise_exception('After the optional system message, conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}
|
150 |
```
|
151 |
Where `system_message` is used to guide the model during generation and `date_string` can be set to allow the model to respond with the current date.
|
152 |
|
|
|
607 |
|
608 |
### Finetuning Data
|
609 |
|
610 |
+
This instructed-tuned variant has been fine-tuned with a collection of 273k instructions, focusing on the performance of Catalan, English and Spanish. However, instruction data for other closely related Iberian languages has also been included, since it yielded a positive impact on the languages of interest. That said, the performance in these additional languages is not guaranteed due to the limited amount of available data and the lack of resources for thorough testing.
|
611 |
+
|
612 |
+
| **Dataset** | **ca** | **en** | **es** | **eu** | **gl** | **pt** | **Total** |
|
613 |
+
|----------------------|------------|-------------|------------|-----------|---------|------------|-------------|
|
614 |
+
| alpaca-cleaned | | 49,950 | | | | | **49,950** |
|
615 |
+
| aya-dataset | | 3,941 | 3,851 | 939 | | 8,995 | **17,726** |
|
616 |
+
| coqcat | 4,797 | | | | | | **4,797** |
|
617 |
+
| databricks-dolly-15k | | 15,011 | | | | | **15,011** |
|
618 |
+
| dolly-ca | 3,232 | | | | | | **3,232** |
|
619 |
+
| flores-dev | 986 | 1,037 | 1,964 | 493 | 505 | | **4,985** |
|
620 |
+
| mentor-ca | 7,119 | | | | | | **7,119** |
|
621 |
+
| mentor-es | | | 7,122 | | | | **7,122** |
|
622 |
+
| no-robots | | 9,485 | | | | | **9,485** |
|
623 |
+
| oasst-ca | 2,517 | | | | | | **2,517** |
|
624 |
+
| oasst2 | 750 | 31,086 | 15,438 | 190 | 197 | 1,203 | **48,864** |
|
625 |
+
| open-orca | | 49,996 | | | | | **49,996** |
|
626 |
+
| rag-multilingual | 16,043 | 14,997 | 11,263 | | | | **42,303** |
|
627 |
+
| tower-blocks | | 7,762 | 1,000 | | | 1,000 | **9,762** |
|
628 |
+
| **Total** | **35,444** | **183,265** | **40,638** | **1,622** | **702** | **11,198** | **272,869** |
|
629 |
|
630 |
---
|
631 |
|
632 |
+
|
633 |
## Evaluation
|
634 |
|
635 |
### Gold-standard benchmarks
|
636 |
+
WiP
|
637 |
+
<!--
|
638 |
Evaluation is done using the Language Model Evaluation Harness (Gao et al., 2024). We evaluate on a set of tasks taken from [SpanishBench](https://github.com/EleutherAI/lm-evaluation-harness/tree/main/lm_eval/tasks/spanish_bench), [CatalanBench](https://github.com/EleutherAI/lm-evaluation-harness/tree/main/lm_eval/tasks/catalan_bench), [BasqueBench](https://github.com/EleutherAI/lm-evaluation-harness/tree/main/lm_eval/tasks/basque_bench) and [GalicianBench](https://github.com/EleutherAI/lm-evaluation-harness/tree/main/lm_eval/tasks/galician_bench). These benchmarks include both new and existing tasks and datasets. Given that this is an instructed model, we add LM Evaluation Harness's native feature of `chat-template` to the setup. In the tables below, we include the results in a selection of evaluation datasets that represent model's performance across a variety of tasks within these benchmarks.
|
639 |
|
640 |
We only use tasks that are either human generated, human translated, or with a strong human-in-the-loop (i.e., machine translation followed by professional revision or machine generation followed by human revision and annotation). This is the reason behind the variety in number of tasks reported across languages. As more tasks that fulfill these requirements are published, we will update the presented results. We also intend to expand the evaluation to other languages, as long as the datasets meet our quality standards.
|
|
|
869 |
</tr>
|
870 |
</tbody>
|
871 |
</table>
|
872 |
+
-->
|
873 |
|
874 |
### LLM-as-a-judge
|
875 |
|
|
|
1091 |
|
1092 |
## Ethical Considerations and Limitations
|
1093 |
|
1094 |
+
We examine the presence of undesired societal and cognitive biases present in this model using different benchmarks. For societal biases,
|
1095 |
+
we test performance using the BBQ dataset (Parrish et al., 2022) in the original English and the Regard dataset (Sheng et al., 2019).
|
1096 |
+
We report that while performance is high (accuracies around 0.8 depending on the social category) in disambiguated settings,
|
1097 |
+
the model performs very poorly in ambiguous settings, which indicates the presence of societal biases that need to be further addressed in post-training phases.
|
1098 |
|
1099 |
+
Our cognitive bias analysis focuses on positional effects in 0-shot settings, and majority class bias in few-shot settings.
|
1100 |
+
For positional effects, we leverage the ARC Multiple Choice Question dataset (Clark et al., 2018). We observe significant,
|
1101 |
+
but relatively weak primacy effects, whereby the model shows a preference for answers towards the beginning of the list of provided answers.
|
1102 |
+
We measure effects of majority class effects in few-shot settings using SST-2 (Socher et al., 2013). We again detect significant effects,
|
1103 |
+
with a small effect size. This suggests that the model is relatively robust against the examined cognitive biases.
|
1104 |
|
1105 |
+
We highlight that our analyses of these biases are by no means exhaustive and are limited by the relative scarcity of adequate resources
|
1106 |
+
in all languages present in the training data. We aim to gradually extend and expand our analyses in future work.
|
1107 |
+
|
1108 |
+
These results can be expected from a model that has undergone only a preliminary instruction tuning.
|
1109 |
+
These tests are performed in order to show the biases the model may contain. We urge developers to take
|
1110 |
+
them into account and perform safety testing and tuning tailored to their specific applications of the model.
|
1111 |
|
1112 |
---
|
1113 |
|
|
|
1134 |
|
1135 |
In Catalonia, many institutions have been involved in the project. Our thanks to Òmnium Cultural, Parlament de Catalunya, Institut d'Estudis Aranesos, Racó Català, Vilaweb, ACN, Nació Digital, El món and Aquí Berguedà.
|
1136 |
|
1137 |
+
At the national level, we are especially grateful to our ILENIA project partners: CENID, HiTZ and CiTIUS for their participation. We also extend our genuine gratitude to the Spanish Senate and Congress, Fundación Dialnet, Fundación Elcano and the ‘Instituto Universitario de Sistemas Inteligentes y Aplicaciones Numéricas en Ingeniería (SIANI)’ of the University of Las Palmas de Gran Canaria.
|
1138 |
|
1139 |
At the international level, we thank the Welsh government, DFKI, Occiglot project, especially Malte Ostendorff, and The Common Crawl Foundation, especially Pedro Ortiz, for their collaboration. We would also like to give special thanks to the NVIDIA team, with whom we have met regularly, specially to: Ignacio Sarasua, Adam Henryk Grzywaczewski, Oleg Sudakov, Sergio Perez, Miguel Martinez, Felipes Soares and Meriem Bendris. Their constant support has been especially appreciated throughout the entire process.
|
1140 |
|
|
|
1150 |
|
1151 |
### Citation
|
1152 |
|
1153 |
+
Technical report coming soon.
|
1154 |
|
1155 |
### License
|
1156 |
[Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0)
|
|
|
1160 |
|:---:|:---:|:---:|
|
1161 |
|2B| [Link](https://huggingface.co/BSC-LT/salamandra-2b) | [Link](https://huggingface.co/BSC-LT/salamandra-2b-instruct) |
|
1162 |
|7B| [Link](https://huggingface.co/BSC-LT/salamandra-7b) | [Link](https://huggingface.co/BSC-LT/salamandra-7b-instruct) |
|
1163 |
+
|40B| [Link](https://huggingface.co/BSC-LT/ALIA-40b) | WiP |
|
config.json
CHANGED
@@ -1,5 +1,5 @@
|
|
1 |
{
|
2 |
-
"_name_or_path": "
|
3 |
"architectures": [
|
4 |
"LlamaForCausalLM"
|
5 |
],
|
@@ -7,6 +7,7 @@
|
|
7 |
"attention_dropout": 0.0,
|
8 |
"bos_token_id": 1,
|
9 |
"eos_token_id": 2,
|
|
|
10 |
"hidden_act": "silu",
|
11 |
"hidden_size": 2048,
|
12 |
"initializer_range": 0.02,
|
@@ -17,7 +18,6 @@
|
|
17 |
"num_attention_heads": 16,
|
18 |
"num_hidden_layers": 24,
|
19 |
"num_key_value_heads": 16,
|
20 |
-
"num_layers": 24,
|
21 |
"pretraining_tp": 1,
|
22 |
"rms_norm_eps": 1e-05,
|
23 |
"rope_scaling": null,
|
|
|
1 |
{
|
2 |
+
"_name_or_path": "/gpfs/projects/bsc88/text/models/instruction-tuning/models/base_models_with_special_tokens/restart_mix1_all_fineweb_2b_new_data_hf",
|
3 |
"architectures": [
|
4 |
"LlamaForCausalLM"
|
5 |
],
|
|
|
7 |
"attention_dropout": 0.0,
|
8 |
"bos_token_id": 1,
|
9 |
"eos_token_id": 2,
|
10 |
+
"head_dim": 128,
|
11 |
"hidden_act": "silu",
|
12 |
"hidden_size": 2048,
|
13 |
"initializer_range": 0.02,
|
|
|
18 |
"num_attention_heads": 16,
|
19 |
"num_hidden_layers": 24,
|
20 |
"num_key_value_heads": 16,
|
|
|
21 |
"pretraining_tp": 1,
|
22 |
"rms_norm_eps": 1e-05,
|
23 |
"rope_scaling": null,
|
model.safetensors
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 4507005744
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:9aabb07d19e1cafe7b9f4bdff98bacc7c9f325a829505a2e376140568c227490
|
3 |
size 4507005744
|