projecte-aina
/

roberta-large-ca-v2

@@ -11,7 +11,7 @@ tags:
 - "masked-lm"
-- "RoBERTa-large-ca"
 - "CaText"
@@ -29,7 +29,7 @@ widget:
 ---
-# Catalan BERTa (roberta-large-ca) large model
 ## Table of Contents
 <details>
@@ -53,13 +53,13 @@ widget:
 ## Model description
-The **roberta-large-ca** is a transformer-based masked language model for the Catalan language.
 It is based on the [RoBERTA](https://github.com/pytorch/fairseq/tree/master/examples/roberta) large model
 and has been trained on a medium-size corpus collected from publicly available corpora and crawlers.
 ## Intended Uses and Limitations
-**roberta-large-ca** model is ready-to-use only for masked language modeling to perform the Fill Mask task (try the inference API or read the next section).
 However, it is intended to be fine-tuned on non-generative downstream tasks such as Question Answering, Text Classification, or Named Entity Recognition.
 ## How to Use
@@ -70,8 +70,8 @@ Here is how to use this model:
 from transformers import AutoModelForMaskedLM
 from transformers import AutoTokenizer, FillMaskPipeline
 from pprint import pprint
-tokenizer_hf = AutoTokenizer.from_pretrained('projecte-aina/roberta-large-ca')
-model = AutoModelForMaskedLM.from_pretrained('projecte-aina/roberta-large-ca')
 model.eval()
 pipeline = FillMaskPipeline(model, tokenizer_hf)
 text = f"Em dic <mask>."
@@ -171,7 +171,7 @@ Here are the train/dev/test splits of the datasets:
 | Task        | NER (F1)      | POS (F1)   | STS-ca (Comb)   | TeCla (Acc.) | TEca (Acc.) | VilaQuAD (F1/EM)| ViquiQuAD (F1/EM) | CatalanQA (F1/EM) | XQuAD-ca <sup>1</sup> (F1/EM) |
 | ------------|:-------------:| -----:|:------|:------|:-------|:------|:----|:----|:----|
-| RoBERTa-large-ca        | **89.82** | **99.02** | **83.41** | **75.46** | **83.61** | **89.34/75.50** | **89.20**/75.77 | **90.72/79.06** | **73.79**/55.34 |
 | RoBERTa-base-ca-v2      | 89.29 | 98.96 | 79.07 | 74.26 | 83.14 | 87.74/72.58 | 88.72/**75.91** | 89.50/76.63 | 73.64/**55.42** |
 | BERTa                   | 89.76 | 98.96 | 80.19 | 73.65 | 79.26 | 85.93/70.58 | 87.12/73.11 | 89.17/77.14 | 69.20/51.47 |
 | mBERT                   | 86.87 | 98.83 | 74.26 | 69.90 | 74.63 | 82.78/67.33 | 86.89/73.53 | 86.90/74.19 | 68.79/50.80 |

 - "masked-lm"
+- "RoBERTa-large-ca-v2"
 - "CaText"
 ---
+# Catalan BERTa (roberta-large-ca-v2) large model
 ## Table of Contents
 <details>
 ## Model description
+The **roberta-large-ca-v2** is a transformer-based masked language model for the Catalan language.
 It is based on the [RoBERTA](https://github.com/pytorch/fairseq/tree/master/examples/roberta) large model
 and has been trained on a medium-size corpus collected from publicly available corpora and crawlers.
 ## Intended Uses and Limitations
+**roberta-large-ca-v2** model is ready-to-use only for masked language modeling to perform the Fill Mask task (try the inference API or read the next section).
 However, it is intended to be fine-tuned on non-generative downstream tasks such as Question Answering, Text Classification, or Named Entity Recognition.
 ## How to Use
 from transformers import AutoModelForMaskedLM
 from transformers import AutoTokenizer, FillMaskPipeline
 from pprint import pprint
+tokenizer_hf = AutoTokenizer.from_pretrained('projecte-aina/roberta-large-ca-v2')
+model = AutoModelForMaskedLM.from_pretrained('projecte-aina/roberta-large-ca-v2')
 model.eval()
 pipeline = FillMaskPipeline(model, tokenizer_hf)
 text = f"Em dic <mask>."
 | Task        | NER (F1)      | POS (F1)   | STS-ca (Comb)   | TeCla (Acc.) | TEca (Acc.) | VilaQuAD (F1/EM)| ViquiQuAD (F1/EM) | CatalanQA (F1/EM) | XQuAD-ca <sup>1</sup> (F1/EM) |
 | ------------|:-------------:| -----:|:------|:------|:-------|:------|:----|:----|:----|
+| RoBERTa-large-ca-v2        | **89.82** | **99.02** | **83.41** | **75.46** | **83.61** | **89.34/75.50** | **89.20**/75.77 | **90.72/79.06** | **73.79**/55.34 |
 | RoBERTa-base-ca-v2      | 89.29 | 98.96 | 79.07 | 74.26 | 83.14 | 87.74/72.58 | 88.72/**75.91** | 89.50/76.63 | 73.64/**55.42** |
 | BERTa                   | 89.76 | 98.96 | 80.19 | 73.65 | 79.26 | 85.93/70.58 | 87.12/73.11 | 89.17/77.14 | 69.20/51.47 |
 | mBERT                   | 86.87 | 98.83 | 74.26 | 69.90 | 74.63 | 82.78/67.33 | 86.89/73.53 | 86.90/74.19 | 68.79/50.80 |