metadata
license: cc-by-4.0
datasets:
- wikiann
language:
- pl
pipeline_tag: token-classification
widget:
- text: >-
Nazywam się Grzegorz Brzęszczyszczykiewicz, pochodzę z
Chrząszczyżewoszczyc, pracuję w Łękołodzkim Urzędzie Powiatowym
- text: Jestem Krzysiek i pracuję w Ministerstwie Sportu
- text: Na imię jej Wiktoria, pracuje w Krakowie na AGH
model-index:
- name: herbert-base-ner
results:
- task:
name: Token Classification
type: token-classification
dataset:
name: wikiann
type: wikiann
config: pl
split: test
args: pl
metrics:
- name: Precision
type: precision
value: 0.8857142857142857
- name: Recall
type: recall
value: 0.9070532179048386
- name: F1
type: f1
value: 0.896256755412619
- name: Accuracy
type: accuracy
value: 0.9581463871961428
herbert-base-ner
Model description
herbert-base-ner is a fine-tuned HerBERT model that can be used for Named Entity Recognition . It has been trained to recognize three types of entities: person (PER), location (LOC) and organization (ORG).
Specifically, this model is an allegro/herbert-base-cased model that was fine-tuned on the Polish subset of wikiann dataset.
How to use
You can use this model with Transformers pipeline for NER.
from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline
model_checkpoint = "pietruszkowiec/herbert-base-ner"
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)
model = AutoModelForTokenClassification.from_pretrained(model_checkpoint)
nlp = pipeline("ner", model=model, tokenizer=tokenizer)
example = "Nazywam się Grzegorz Brzęszczyszczykiewicz, pochodzę "\
"z Chrząszczyżewoszczyc, pracuję w Łękołodzkim Urzędzie Powiatowym"
ner_results = nlp(example)
print(ner_results)
BibTeX entry and citation info
@inproceedings{mroczkowski-etal-2021-herbert,
title = "{H}er{BERT}: Efficiently Pretrained Transformer-based Language Model for {P}olish",
author = "Mroczkowski, Robert and
Rybak, Piotr and
Wr{\\'o}blewska, Alina and
Gawlik, Ireneusz",
booktitle = "Proceedings of the 8th Workshop on Balto-Slavic Natural Language Processing",
month = apr,
year = "2021",
address = "Kiyv, Ukraine",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/2021.bsnlp-1.1",
pages = "1--10",
}
@inproceedings{pan-etal-2017-cross,
title = "Cross-lingual Name Tagging and Linking for 282 Languages",
author = "Pan, Xiaoman and
Zhang, Boliang and
May, Jonathan and
Nothman, Joel and
Knight, Kevin and
Ji, Heng",
booktitle = "Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
month = jul,
year = "2017",
address = "Vancouver, Canada",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/P17-1178",
doi = "10.18653/v1/P17-1178",
pages = "1946--1958",
abstract = "The ambitious goal of this work is to develop a cross-lingual name tagging and linking framework for 282 languages that exist in Wikipedia. Given a document in any of these languages, our framework is able to identify name mentions, assign a coarse-grained or fine-grained type to each mention, and link it to an English Knowledge Base (KB) if it is linkable. We achieve this goal by performing a series of new KB mining methods: generating {``}silver-standard{''} annotations by transferring annotations from English to other languages through cross-lingual links and KB properties, refining annotations through self-training and topic selection, deriving language-specific morphology features from anchor links, and mining word translation pairs from cross-lingual links. Both name tagging and linking results for 282 languages are promising on Wikipedia data and on-Wikipedia data.",
}