metadata

language:
  - sw
license: apache-2.0
datasets:
  - masakhaner
pipeline_tag: token-classification
examples: null
widget:
  - text: Joe Bidden ni rais wa marekani.
    example_title: Sentence 1
  - text: Tumefanya mabadiliko muhimu katika sera zetu za faragha na vidakuzi.
    example_title: Sentence 2
  - text: Mtoto anaweza kupoteza muda kabisa.
    example_title: Sentence 3
metrics:
  - accuracy

Swahili Named Entity Recognition

TUS-NER-sw is a fine-tuned BERT model that is ready to use for Named Entity Recognition and achieves state-of-the-art performance 😀
Finetuned from model: eolang/SW-v1

Intended uses & limitations

How to use

You can use this model with Transformers pipeline for NER.

from transformers import pipeline
from transformers import AutoTokenizer, AutoModelForTokenClassification

tokenizer = AutoTokenizer.from_pretrained("eolang/SW-NER-v1")
model = AutoModelForTokenClassification.from_pretrained("eolang/SW-NER-v1")

nlp = pipeline("ner", model=model, tokenizer=tokenizer)
example = "Tumefanya mabadiliko muhimu katika sera zetu za faragha na vidakuzi"

ner_results = nlp(example)
print(ner_results)

Training data

This model was fine-tuned on the Swahili Version of the Masakhane Dataset from the MasakhaneNER Project. MasakhaNER is a collection of Named Entity Recognition (NER) datasets for 10 different African languages. The languages forming this dataset are: Amharic, Hausa, Igbo, Kinyarwanda, Luganda, Luo, Nigerian-Pidgin, Swahili, Wolof, and Yorùbá.

Training procedure

This model was trained on a single NVIDIA RTX 3090 GPU with recommended hyperparameters from the original BERT paper.