tthhanh commited on
Commit
14eef7a
·
verified ·
1 Parent(s): 0842e55

update description for use

Browse files
Files changed (1) hide show
  1. README.md +46 -2
README.md CHANGED
@@ -15,7 +15,7 @@ language:
15
 
16
  # XLMR Token Classifier for Term Extraction
17
 
18
- This model is a fine-tuned version of [xlm-roberta-base](https://huggingface.co/xlm-roberta-base) for term extraction tasks.
19
 
20
  ## Model description
21
 
@@ -28,6 +28,32 @@ The model is intended for term extraction tasks. It can be applied in domains li
28
  - Named Entity Recognition (NER)
29
  - Information Extraction
30
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
31
  ## Training and evaluation data
32
 
33
  We fine-tuned the English version of the ACTER dataset where we trained on the Corruption and Wind Energy domain, validated on the Equitation domain, and tested on the Heart Failure domain.
@@ -50,4 +76,22 @@ The following hyperparameters were used during training:
50
  - Transformers 4.26.1
51
  - Pytorch 2.0.1+cu117
52
  - Datasets 2.9.0
53
- - Tokenizers 0.13.2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
 
16
  # XLMR Token Classifier for Term Extraction
17
 
18
+ This model is a fine-tuned version of [xlm-roberta-base](https://huggingface.co/xlm-roberta-base) for cross-domain term extraction tasks.
19
 
20
  ## Model description
21
 
 
28
  - Named Entity Recognition (NER)
29
  - Information Extraction
30
 
31
+ ## How to use
32
+
33
+ Here's a quick example of how to use the model with the Hugging Face `transformers` library:
34
+
35
+ ```python
36
+ from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline
37
+
38
+ # Load the tokenizer and model
39
+ tokenizer = AutoTokenizer.from_pretrained("tthhanh/xlm-ate-nobi-en")
40
+ model = AutoModelForTokenClassification.from_pretrained("tthhanh/xlm-ate-nobi-en")
41
+
42
+ # Create a pipeline for token classification
43
+ nlp = pipeline("token-classification", model=model, tokenizer=tokenizer, aggregation_strategy="simple")
44
+
45
+ # Example text
46
+ text = "Treatment of anemia in patients with heart disease : a clinical practice guideline from the American College of Physicians ."
47
+
48
+ # Get predictions
49
+ predictions = nlp(text)
50
+
51
+ # Print predictions
52
+ for prediction in predictions:
53
+ print(prediction)
54
+
55
+ ```
56
+
57
  ## Training and evaluation data
58
 
59
  We fine-tuned the English version of the ACTER dataset where we trained on the Corruption and Wind Energy domain, validated on the Equitation domain, and tested on the Heart Failure domain.
 
76
  - Transformers 4.26.1
77
  - Pytorch 2.0.1+cu117
78
  - Datasets 2.9.0
79
+ - Tokenizers 0.13.2
80
+
81
+ ## Evaluation
82
+
83
+ We evaluate the performance of the ATE systems by comparing the candidate list extracted from the test set with the manually annotated gold standard term list for that specific test set. We use exact string matching to compare the retrieved terms to the ones in the gold standard and calculate Precision (P), Recall (R), and F1-score (F1).
84
+ The results are reported in [Can cross-domain term extraction benefit from cross-lingual transfer and nested term labeling?](https://link.springer.com/article/10.1007/s10994-023-06506-7#Sec12).
85
+
86
+ ## Citation
87
+ If you use this model in your research or application, please cite it as follows:
88
+ ```
89
+ @inproceedings{tran2022can,
90
+ title={Can cross-domain term extraction benefit from cross-lingual transfer?},
91
+ author={Tran, Hanh Thi Hong and Martinc, Matej and Doucet, Antoine and Pollak, Senja},
92
+ booktitle={International Conference on Discovery Science},
93
+ pages={363--378},
94
+ year={2022},
95
+ organization={Springer}
96
+ }
97
+ ```