lcampillos commited on
Commit
95bde2d
1 Parent(s): a0c23f2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -19
README.md CHANGED
@@ -2,8 +2,6 @@
2
  license: cc-by-nc-4.0
3
  tags:
4
  - generated_from_trainer
5
- language:
6
- - es
7
  metrics:
8
  - precision
9
  - recall
@@ -13,9 +11,8 @@ model-index:
13
  - name: roberta-es-clinical-trials-neg-spec
14
  results: []
15
  widget:
16
- - text: "Pacientes sanos, sin ninguna enfermedad, que no tomen medicamentos"
17
  - text: "Sujetos adultos con cáncer de próstata asintomáticos y no tratados previamente"
18
- - text: "Enfermedades con posibles síntomas de urticaria o angioedema"
19
  ---
20
 
21
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -25,21 +22,21 @@ should probably proofread and complete it, then remove this comment. -->
25
 
26
  This named entity recognition model detects negation and speculation entities, and negated and speculated concepts:
27
  - Neg_cue: negation cue (e.g. *no*, *sin*)
28
- - Negated: negated entity or event (e.g. *sin **dolor***)
29
  - Spec_cue: speculation cue (e.g. *posiblemente*)
30
- - Speculated: speculated entity or event (e.g. *posiblemente **sobreviva***)
31
 
32
  The model achieves the following results on the test set (when trained with the training and development set; results are averaged over 5 evaluation rounds):
33
- - Precision: 0.838 (±0.003)
34
  - Recall: 0.866 (±0.005)
35
- - F1: 0.852 (±0.003)
36
- - Accuracy: 0.986 (±0.001)
37
 
38
  ## Model description
39
 
40
  This model adapts the pre-trained model [bsc-bio-ehr-es](https://huggingface.co/PlanTL-GOB-ES/bsc-bio-ehr-es), presented in [Pio Carriño et al. (2022)](https://aclanthology.org/2022.bionlp-1.19/).
41
  It is fine-tuned to conduct medical named entity recognition on Spanish texts about clinical trials.
42
- The model is fine-tuned on the [NUBEs corpus (Lima et al. 2020)](https://aclanthology.org/2020.lrec-1.708/) and on the [CT-EBM-SP corpus (Campillos-Llanos et al. 2021)](https://bmcmedinformdecismak.biomedcentral.com/articles/10.1186/s12911-021-01395-z).
43
 
44
  ## Intended uses & limitations
45
 
@@ -64,15 +61,15 @@ El propietario o creador de los modelos de ningún modo será responsable de los
64
 
65
  The data used for fine-tuning are:
66
 
67
- 1) The [Negation and Uncertainty in Spanish Corpus (NUBes)](https://github.com/Vicomtech/NUBes-negation-uncertainty-biomedical-corpus):
68
  It is a collection of 29 682 sentences (518 068 tokens) from anonymised health records in Spanish, annotated with negation and uncertainty cues and their scopes.
69
 
70
- 2) The [Clinical Trials for Evidence-Based-Medicine in Spanish corpus](http://www.lllf.uam.es/ESP/nlpdata/wp2/):
71
  It is a collection of 1200 texts about clinical trials studies and clinical trials announcements:
72
  - 500 abstracts from journals published under a Creative Commons license, e.g. available in PubMed or the Scientific Electronic Library Online (SciELO)
73
  - 700 clinical trials announcements published in the European Clinical Trials Register and Repositorio Español de Estudios Clínicos
74
 
75
- If you use the CT-EBM-SP resource, please, cite as follows:
76
 
77
  ```
78
  @article{campillosetal-midm2021,
@@ -100,24 +97,24 @@ The following hyperparameters were used during training:
100
  - seed: we used different seeds for 5 evaluation rounds, and uploaded the model with the best results
101
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
102
  - lr_scheduler_type: linear
103
- - num_epochs: 8
104
 
105
 
106
  ### Training results (test set; average and standard deviation of 5 rounds with different seeds)
107
 
108
  | Precision | Recall | F1 | Accuracy |
109
  |:--------------:|:--------------:|:--------------:|:--------------:|
110
- | 0.838 (±0.003) | 0.866 (±0.005) | 0.852 (±0.003) | 0.986 (±0.001) |
111
 
112
 
113
  **Results per class (test set; average and standard deviation of 5 rounds with different seeds)**
114
 
115
  | Class | Precision | Recall | F1 | Support |
116
  |:-----------:|:--------------:|:--------------:|:--------------:|:---------:|
117
- | Neg_cue | 0.945 (±0.004) | 0.961 (±0.002) | 0.953 (±0.003) | 2416 |
118
- | Negated | 0.815 (±0.003) | 0.838 (±0.005) | 0.826 (±0.003) | 3064 |
119
- | Spec_cue | 0.811 (±0.005) | 0.868 (±0.009) | 0.839 (±0.005) | 746 |
120
- | Speculated | 0.685 (±0.009) | 0.719 (±0.016) | 0.701 (±0.008) | 993 |
121
 
122
 
123
  ### Framework versions
 
2
  license: cc-by-nc-4.0
3
  tags:
4
  - generated_from_trainer
 
 
5
  metrics:
6
  - precision
7
  - recall
 
11
  - name: roberta-es-clinical-trials-neg-spec
12
  results: []
13
  widget:
14
+ - text: "Pacientes sanos, sin ninguna enfermedad, que no tomen ningún medicamento"
15
  - text: "Sujetos adultos con cáncer de próstata asintomáticos y no tratados previamente"
 
16
  ---
17
 
18
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 
22
 
23
  This named entity recognition model detects negation and speculation entities, and negated and speculated concepts:
24
  - Neg_cue: negation cue (e.g. *no*, *sin*)
25
+ - Negated: negated entity or event (e.g. *sin* **dolor**)
26
  - Spec_cue: speculation cue (e.g. *posiblemente*)
27
+ - Speculated: speculated entity or event (e.g. *posiblemente* **sobreviva**)
28
 
29
  The model achieves the following results on the test set (when trained with the training and development set; results are averaged over 5 evaluation rounds):
30
+ - Precision: 0.840 (±0.003)
31
  - Recall: 0.866 (±0.005)
32
+ - F1: 0.853 (±0.004)
33
+ - Accuracy: 0.985 (±0.001)
34
 
35
  ## Model description
36
 
37
  This model adapts the pre-trained model [bsc-bio-ehr-es](https://huggingface.co/PlanTL-GOB-ES/bsc-bio-ehr-es), presented in [Pio Carriño et al. (2022)](https://aclanthology.org/2022.bionlp-1.19/).
38
  It is fine-tuned to conduct medical named entity recognition on Spanish texts about clinical trials.
39
+ The model is fine-tuned on the [NUBEs corpus (Lima et al. 2020)](https://aclanthology.org/2020.lrec-1.708/) and on the [CT-EBM-ES corpus (Campillos-Llanos et al. 2021)](https://bmcmedinformdecismak.biomedcentral.com/articles/10.1186/s12911-021-01395-z).
40
 
41
  ## Intended uses & limitations
42
 
 
61
 
62
  The data used for fine-tuning are:
63
 
64
+ 1) The [Negation and Uncertainty in Spanish Corpus (NUBes)](https://github.com/Vicomtech/NUBes-negation-uncertainty-biomedical-corpus)
65
  It is a collection of 29 682 sentences (518 068 tokens) from anonymised health records in Spanish, annotated with negation and uncertainty cues and their scopes.
66
 
67
+ 2) The [Clinical Trials for Evidence-Based-Medicine in Spanish corpus](http://www.lllf.uam.es/ESP/nlpdata/wp2/).
68
  It is a collection of 1200 texts about clinical trials studies and clinical trials announcements:
69
  - 500 abstracts from journals published under a Creative Commons license, e.g. available in PubMed or the Scientific Electronic Library Online (SciELO)
70
  - 700 clinical trials announcements published in the European Clinical Trials Register and Repositorio Español de Estudios Clínicos
71
 
72
+ If you use the CT-EBM-ES resource, please, cite as follows:
73
 
74
  ```
75
  @article{campillosetal-midm2021,
 
97
  - seed: we used different seeds for 5 evaluation rounds, and uploaded the model with the best results
98
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
99
  - lr_scheduler_type: linear
100
+ - num_epochs: average 10.5 epochs (±1.9); trained with early stopping if no improvement after 5 epochs (early stopping patience: 5)
101
 
102
 
103
  ### Training results (test set; average and standard deviation of 5 rounds with different seeds)
104
 
105
  | Precision | Recall | F1 | Accuracy |
106
  |:--------------:|:--------------:|:--------------:|:--------------:|
107
+ | 0.840 (±0.003) | 0.866 (±0.005) | 0.853 (±0.004) | 0.985 (±0.001) |
108
 
109
 
110
  **Results per class (test set; average and standard deviation of 5 rounds with different seeds)**
111
 
112
  | Class | Precision | Recall | F1 | Support |
113
  |:-----------:|:--------------:|:--------------:|:--------------:|:---------:|
114
+ | Neg_cue | 0.938 (±0.004) | 0.963 (±0.003) | 0.950 (±0.002) | 2436 |
115
+ | Negated | 0.799 (±0.018) | 0.843 (±0.008) | 0.820 (±0.010) | 3086 |
116
+ | Spec_cue | 0.821 (±0.021) | 0.852 (±0.015) | 0.836 (±0.008) | 749 |
117
+ | Speculated | 0.710 (±0.002) | 0.721 (±0.010) | 0.715 (±0.005) | 996 |
118
 
119
 
120
  ### Framework versions