updated README
Browse files
README.md
CHANGED
@@ -957,7 +957,7 @@ Score 1: The answer is mathematically correct, with accurate calculations and ap
|
|
957 |
|
958 |
#### Multilingual results
|
959 |
|
960 |
-
Here, we present results for seven categories of tasks in Spanish, Catalan, Basque, Galician, and English. Results are presented for each task, criterion and language. Criteria with a `(B)` after their name are binary criteria (i.e., numbers go from 0 to 1, where 1 is best). The rest of the criteria are measured using a 5-point Likert scale, where 5 is best. The first number of the pair of numbers separated by `/` shows the average score for the criterion (and language). The second number of each pair is the robustness score, where numbers closer to 0
|
961 |
|
962 |
Further details on all tasks and criteria, a full list of results compared to other baselines, a discussion of the model's performance across tasks and its implications, and details regarding problem-solving with task implementation will soon be available in the technical report.
|
963 |
|
@@ -1174,4 +1174,4 @@ Technical report coming soon.
|
|
1174 |
|:---:|:---:|:---:|
|
1175 |
|2B| [Link](https://huggingface.co/BSC-LT/salamandra-2b) | [Link](https://huggingface.co/BSC-LT/salamandra-2b-instruct) |
|
1176 |
|7B| [Link](https://huggingface.co/BSC-LT/salamandra-7b) | [Link](https://huggingface.co/BSC-LT/salamandra-7b-instruct) |
|
1177 |
-
|40B| [Link](https://huggingface.co/BSC-LT/ALIA-40b) | WiP |
|
|
|
957 |
|
958 |
#### Multilingual results
|
959 |
|
960 |
+
Here, we present results for seven categories of tasks in Spanish, Catalan, Basque, Galician, and English. Results are presented for each task, criterion and language. Criteria with a `(B)` after their name are binary criteria (i.e., numbers go from 0 to 1, where 1 is best). The rest of the criteria are measured using a 5-point Likert scale, where 5 is best. The first number of the pair of numbers separated by `/` shows the average score for the criterion (and language). The second number of each pair is the robustness score, where numbers closer to 0 mean that the model generates similar responses when comparing the three prompt varieties for a single instance.
|
961 |
|
962 |
Further details on all tasks and criteria, a full list of results compared to other baselines, a discussion of the model's performance across tasks and its implications, and details regarding problem-solving with task implementation will soon be available in the technical report.
|
963 |
|
|
|
1174 |
|:---:|:---:|:---:|
|
1175 |
|2B| [Link](https://huggingface.co/BSC-LT/salamandra-2b) | [Link](https://huggingface.co/BSC-LT/salamandra-2b-instruct) |
|
1176 |
|7B| [Link](https://huggingface.co/BSC-LT/salamandra-7b) | [Link](https://huggingface.co/BSC-LT/salamandra-7b-instruct) |
|
1177 |
+
|40B| [Link](https://huggingface.co/BSC-LT/ALIA-40b) | WiP |
|