BSC-LT
/

salamandra-7b-instruct

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

joanllop commited on 18 days ago

Commit

15b97e8

·

1 Parent(s): 69ab8e2

updated README

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -957,7 +957,7 @@ Score 1: The answer is mathematically correct, with accurate calculations and ap
 #### Multilingual results
-Here, we present results for seven categories of tasks in Spanish, Catalan, Basque, Galician, and English. Results are presented for each task, criterion and language. Criteria with a `(B)` after their name are binary criteria (i.e., numbers go from 0 to 1, where 1 is best). The rest of the criteria are measured using a 5-point Likert scale, where 5 is best. The first number of the pair of numbers separated by `/` shows the average score for the criterion (and language). The second number of each pair is the robustness score, where numbers closer to 0 means that the model generates similar responses when comparing the three prompt varieties for a single instance.
 Further details on all tasks and criteria, a full list of results compared to other baselines, a discussion of the model's performance across tasks and its implications, and details regarding problem-solving with task implementation will soon be available in the technical report.
@@ -1174,4 +1174,4 @@ Technical report coming soon.
 |:---:|:---:|:---:|
 |2B| [Link](https://huggingface.co/BSC-LT/salamandra-2b) | [Link](https://huggingface.co/BSC-LT/salamandra-2b-instruct) |
 |7B| [Link](https://huggingface.co/BSC-LT/salamandra-7b) | [Link](https://huggingface.co/BSC-LT/salamandra-7b-instruct) |
-|40B| [Link](https://huggingface.co/BSC-LT/ALIA-40b) | WiP |

 #### Multilingual results
+Here, we present results for seven categories of tasks in Spanish, Catalan, Basque, Galician, and English. Results are presented for each task, criterion and language. Criteria with a `(B)` after their name are binary criteria (i.e., numbers go from 0 to 1, where 1 is best). The rest of the criteria are measured using a 5-point Likert scale, where 5 is best. The first number of the pair of numbers separated by `/` shows the average score for the criterion (and language). The second number of each pair is the robustness score, where numbers closer to 0 mean that the model generates similar responses when comparing the three prompt varieties for a single instance.
 Further details on all tasks and criteria, a full list of results compared to other baselines, a discussion of the model's performance across tasks and its implications, and details regarding problem-solving with task implementation will soon be available in the technical report.
 |:---:|:---:|:---:|
 |2B| [Link](https://huggingface.co/BSC-LT/salamandra-2b) | [Link](https://huggingface.co/BSC-LT/salamandra-2b-instruct) |
 |7B| [Link](https://huggingface.co/BSC-LT/salamandra-7b) | [Link](https://huggingface.co/BSC-LT/salamandra-7b-instruct) |
+|40B| [Link](https://huggingface.co/BSC-LT/ALIA-40b) | WiP |