Token Classification
Transformers
PyTorch
Safetensors
xmod
named-entity-recognition
swissbert-ner / README.md
jvamvas's picture
Add citation
490a789
|
raw
history blame
1.46 kB
metadata
license: cc-by-nc-4.0
datasets:
  - Babelscape/wikineural
language:
  - de
  - fr
  - it
  - rm
  - multilingual
inference: false
tags:
  - named-entity-recognition

The SwissBERT model fine-tuned on the WikiNEuRal dataset for multilingual NER.

Supports German, French and Italian as supervised languages and Romansh Grischun as a zero-shot language.

Usage

from transformers import pipeline

token_classifier = pipeline(
  model="ZurichNLP/swissbert-ner",
  aggregation_strategy="simple",
)

German example

token_classifier.model.set_default_language("de_CH")
token_classifier("Mein Name sei Gantenbein.")

Output:

[{'entity_group': 'PER',
  'score': 0.5002625,
  'word': 'Gantenbein',
  'start': 13,
  'end': 24}]

French example

token_classifier.model.set_default_language("fr_CH")
token_classifier("J'habite à Lausanne.")

Output:

[{'entity_group': 'LOC',
  'score': 0.99955386,
  'word': 'Lausanne',
  'start': 10,
  'end': 19}]

Citation

@article{vamvas-etal-2023-swissbert,
      title={Swiss{BERT}: The Multilingual Language Model for Switzerland}, 
      author={Jannis Vamvas and Johannes Gra\"en and Rico Sennrich},
      year={2023},
      eprint={2303.13310},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2303.13310}
}