joanllop commited on
Commit
15b97e8
·
1 Parent(s): 69ab8e2

updated README

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -957,7 +957,7 @@ Score 1: The answer is mathematically correct, with accurate calculations and ap
957
 
958
  #### Multilingual results
959
 
960
- Here, we present results for seven categories of tasks in Spanish, Catalan, Basque, Galician, and English. Results are presented for each task, criterion and language. Criteria with a `(B)` after their name are binary criteria (i.e., numbers go from 0 to 1, where 1 is best). The rest of the criteria are measured using a 5-point Likert scale, where 5 is best. The first number of the pair of numbers separated by `/` shows the average score for the criterion (and language). The second number of each pair is the robustness score, where numbers closer to 0 means that the model generates similar responses when comparing the three prompt varieties for a single instance.
961
 
962
  Further details on all tasks and criteria, a full list of results compared to other baselines, a discussion of the model's performance across tasks and its implications, and details regarding problem-solving with task implementation will soon be available in the technical report.
963
 
@@ -1174,4 +1174,4 @@ Technical report coming soon.
1174
  |:---:|:---:|:---:|
1175
  |2B| [Link](https://huggingface.co/BSC-LT/salamandra-2b) | [Link](https://huggingface.co/BSC-LT/salamandra-2b-instruct) |
1176
  |7B| [Link](https://huggingface.co/BSC-LT/salamandra-7b) | [Link](https://huggingface.co/BSC-LT/salamandra-7b-instruct) |
1177
- |40B| [Link](https://huggingface.co/BSC-LT/ALIA-40b) | WiP |
 
957
 
958
  #### Multilingual results
959
 
960
+ Here, we present results for seven categories of tasks in Spanish, Catalan, Basque, Galician, and English. Results are presented for each task, criterion and language. Criteria with a `(B)` after their name are binary criteria (i.e., numbers go from 0 to 1, where 1 is best). The rest of the criteria are measured using a 5-point Likert scale, where 5 is best. The first number of the pair of numbers separated by `/` shows the average score for the criterion (and language). The second number of each pair is the robustness score, where numbers closer to 0 mean that the model generates similar responses when comparing the three prompt varieties for a single instance.
961
 
962
  Further details on all tasks and criteria, a full list of results compared to other baselines, a discussion of the model's performance across tasks and its implications, and details regarding problem-solving with task implementation will soon be available in the technical report.
963
 
 
1174
  |:---:|:---:|:---:|
1175
  |2B| [Link](https://huggingface.co/BSC-LT/salamandra-2b) | [Link](https://huggingface.co/BSC-LT/salamandra-2b-instruct) |
1176
  |7B| [Link](https://huggingface.co/BSC-LT/salamandra-7b) | [Link](https://huggingface.co/BSC-LT/salamandra-7b-instruct) |
1177
+ |40B| [Link](https://huggingface.co/BSC-LT/ALIA-40b) | WiP |