tthhanh commited on
Commit
912abc2
·
verified ·
1 Parent(s): 8518b3e

update description

Browse files
Files changed (1) hide show
  1. README.md +60 -33
README.md CHANGED
@@ -7,39 +7,61 @@ metrics:
7
  - recall
8
  - f1
9
  model-index:
10
- - name: xlm-ate-nobi-en-nes
11
  results: []
 
 
12
  ---
13
 
14
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
- should probably proofread and complete it, then remove this comment. -->
16
 
17
- # xlm-ate-nobi-en-nes
18
-
19
- This model is a fine-tuned version of [xlm-roberta-base](https://huggingface.co/xlm-roberta-base) on an unknown dataset.
20
- It achieves the following results on the evaluation set:
21
- - Loss: 1.2581
22
- - Precision: 0.5875
23
- - Recall: 0.4794
24
- - F1: 0.5280
25
 
26
  ## Model description
27
 
28
- More information needed
 
29
 
30
  ## Intended uses & limitations
31
 
32
- More information needed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
 
34
  ## Training and evaluation data
35
 
36
- More information needed
37
 
38
  ## Training procedure
39
 
40
- ### Training hyperparameters
41
-
42
  The following hyperparameters were used during training:
 
43
  - learning_rate: 2e-05
44
  - train_batch_size: 32
45
  - eval_batch_size: 32
@@ -47,25 +69,30 @@ The following hyperparameters were used during training:
47
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
48
  - lr_scheduler_type: linear
49
  - num_epochs: 20
 
50
 
51
- ### Training results
52
-
53
- | Training Loss | Epoch | Step | Validation Loss | Precision | Recall | F1 |
54
- |:-------------:|:-----:|:----:|:---------------:|:---------:|:------:|:------:|
55
- | 0.2205 | 1.85 | 500 | 0.5622 | 0.5705 | 0.4546 | 0.5060 |
56
- | 0.077 | 3.69 | 1000 | 0.7307 | 0.5715 | 0.4819 | 0.5229 |
57
- | 0.0421 | 5.54 | 1500 | 0.8561 | 0.5725 | 0.4965 | 0.5318 |
58
- | 0.0253 | 7.38 | 2000 | 0.8979 | 0.5601 | 0.5181 | 0.5383 |
59
- | 0.0157 | 9.23 | 2500 | 1.1252 | 0.6047 | 0.4565 | 0.5203 |
60
- | 0.0099 | 11.07 | 3000 | 1.1651 | 0.5874 | 0.4781 | 0.5271 |
61
- | 0.0077 | 12.92 | 3500 | 1.0574 | 0.5471 | 0.5270 | 0.5369 |
62
- | 0.0052 | 14.76 | 4000 | 1.1903 | 0.5879 | 0.4863 | 0.5323 |
63
- | 0.0034 | 16.61 | 4500 | 1.2581 | 0.5875 | 0.4794 | 0.5280 |
64
-
65
-
66
- ### Framework versions
67
-
68
  - Transformers 4.26.1
69
  - Pytorch 2.0.1+cu117
70
  - Datasets 2.9.0
71
  - Tokenizers 0.13.2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  - recall
8
  - f1
9
  model-index:
10
+ - name: xlm-ate-nobi-en
11
  results: []
12
+ language:
13
+ - en
14
  ---
15
 
16
+ # XLMR Token Classifier for Term Extraction
 
17
 
18
+ This model is a fine-tuned version of [xlm-roberta-base](https://huggingface.co/xlm-roberta-base) for cross-domain term extraction tasks.
 
 
 
 
 
 
 
19
 
20
  ## Model description
21
 
22
+ This model is a fine-tuned version of [xlm-roberta-base](https://huggingface.co/xlm-roberta-base) for token classification, specifically designed to identify and classify terms within text sequences. The model assigns labels such as B-Term, I-Term, BN-Term, IN-Term, and O to individual tokens, allowing for the extraction of meaningful terms from the text.
23
+
24
 
25
  ## Intended uses & limitations
26
 
27
+ The model is intended for term extraction tasks. It can be applied in domains like:
28
+ - Named Entity Recognition (NER)
29
+ - Information Extraction
30
+
31
+ ## How to use
32
+
33
+ Here's a quick example of how to use the model with the Hugging Face `transformers` library:
34
+
35
+ ```python
36
+ from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline
37
+
38
+ # Load the tokenizer and model
39
+ tokenizer = AutoTokenizer.from_pretrained("tthhanh/xlm-ate-nobi-en-nes")
40
+ model = AutoModelForTokenClassification.from_pretrained("tthhanh/xlm-ate-nobi-en-nes")
41
+
42
+ # Create a pipeline for token classification
43
+ nlp = pipeline("token-classification", model=model, tokenizer=tokenizer, aggregation_strategy="simple")
44
+
45
+ # Example text
46
+ text = "Treatment of anemia in patients with heart disease : a clinical practice guideline from the American College of Physicians ."
47
+
48
+ # Get predictions
49
+ predictions = nlp(text)
50
+
51
+ # Print predictions
52
+ for prediction in predictions:
53
+ print(prediction)
54
+
55
+ ```
56
 
57
  ## Training and evaluation data
58
 
59
+ We fine-tuned the English version of the ACTER dataset where Named Entities are included in the gold standard. We trained on the Corruption and Wind Energy domain, validated on the Equitation domain, and tested on the Heart Failure domain.
60
 
61
  ## Training procedure
62
 
 
 
63
  The following hyperparameters were used during training:
64
+ ```
65
  - learning_rate: 2e-05
66
  - train_batch_size: 32
67
  - eval_batch_size: 32
 
69
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
70
  - lr_scheduler_type: linear
71
  - num_epochs: 20
72
+ ```
73
 
74
+ Framework versions:
75
+ ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
76
  - Transformers 4.26.1
77
  - Pytorch 2.0.1+cu117
78
  - Datasets 2.9.0
79
  - Tokenizers 0.13.2
80
+ ```
81
+
82
+ ## Evaluation
83
+
84
+ We evaluate the performance of the ATE systems by comparing the candidate list extracted from the test set with the manually annotated gold standard term list for that specific test set. We use exact string matching to compare the retrieved terms to the ones in the gold standard and calculate Precision (P), Recall (R), and F1-score (F1).
85
+ The results are reported in [Can cross-domain term extraction benefit from cross-lingual transfer and nested term labeling?](https://link.springer.com/article/10.1007/s10994-023-06506-7#Sec12).
86
+
87
+ ## Citation
88
+ If you use this model in your research or application, please cite it as follows:
89
+ ```
90
+ @inproceedings{tran2022can,
91
+ title={Can cross-domain term extraction benefit from cross-lingual transfer?},
92
+ author={Tran, Hanh Thi Hong and Martinc, Matej and Doucet, Antoine and Pollak, Senja},
93
+ booktitle={International Conference on Discovery Science},
94
+ pages={363--378},
95
+ year={2022},
96
+ organization={Springer}
97
+ }
98
+ ```