flair-clean-conll-5 / README.md
stefan-it's picture
readme: minor tweak
33632f0 verified
|
raw
history blame
2.09 kB
---
language:
- en
library_name: flair
pipeline_tag: token-classification
base_model: FacebookAI/xlm-roberta-large
widget:
- text: According to the BBC George Washington went to Washington.
---
# Flair NER Model trained on CleanCoNLL Dataset
This (unofficial) Flair NER model was trained on the awesome [CleanCoNLL](https://aclanthology.org/2023.emnlp-main.533/) dataset.
The CleanCoNLL dataset was proposed by Susanna Rücker and Alan Akbik and introduces a corrected version of the classic CoNLL-03 dataset, with updated and more consistent NER labels.
## Fine-Tuning
We use XLM-RoBERTa Large as backbone language model and the following hyper-parameters for fine-tuning:
| Hyper-Parameter | Value |
|:--------------- |:-------|
| Batch Size | `4` |
| Learning Rate | `5-06` |
| Max. Epochs | `10` |
Additionally, the [FLERT](https://arxiv.org/abs/2011.06993) approach is used for fine-tuning the model.
## Results
We report micro F1-Score on development (in brackets) and test set for five runs with different seeds:
| Seed 1 | Seed 2 | Seed 3 | Seed 4 | Seed 5 | Avg.
|:--------------- |:--------------- |:--------------- |:--------------- |:--------------- |:--------------- |
| (97.34) / 97.00 | (97.26) / 96.90 | (97.66) / 97.02 | (97.42) / 96.96 | (97.46) / 96.99 | (97.43) / 96.97 |
Rücker and Akbik report 96.98 on three different runs, so our results are very close to their reported performance!
# Flair Demo
The following snippet shows how to use the CleanCoNLL NER models with Flair:
```python
from flair.data import Sentence
from flair.models import SequenceTagger
# load tagger
tagger = SequenceTagger.load("stefan-it/flair-clean-conll-5")
# make example sentence
sentence = Sentence("According to the BBC George Washington went to Washington.")
# predict NER tags
tagger.predict(sentence)
# print sentence
print(sentence)
# print predicted NER spans
print('The following NER tags are found:')
# iterate over entities and print
for entity in sentence.get_spans('ner'):
print(entity)
```