--- language: - sw license: apache-2.0 datasets: - masakhaner pipeline_tag: token-classification examples: null widget: - text: Joe Bidden ni rais wa marekani. example_title: Sentence 1 - text: Tumefanya mabadiliko muhimu katika sera zetu za faragha na vidakuzi. example_title: Sentence 2 - text: Mtoto anaweza kupoteza muda kabisa. example_title: Sentence 3 metrics: - accuracy --- # Swahili Named Entity Recognition - **TUS-NER-sw** is a fine-tuned BERT model that is ready to use for **Named Entity Recognition** and achieves **state-of-the-art performance 😀** - Finetuned from model: [eolang/SW-v1](https://huggingface.co/eolang/SW-v1) ## Intended uses & limitations #### How to use You can use this model with Transformers *pipeline* for NER. ```python from transformers import pipeline from transformers import AutoTokenizer, AutoModelForTokenClassification tokenizer = AutoTokenizer.from_pretrained("eolang/SW-NER-v1") model = AutoModelForTokenClassification.from_pretrained("eolang/SW-NER-v1") nlp = pipeline("ner", model=model, tokenizer=tokenizer) example = "Tumefanya mabadiliko muhimu katika sera zetu za faragha na vidakuzi" ner_results = nlp(example) print(ner_results) ``` ## Training data This model was fine-tuned on the Swahili Version of the [Masakhane Dataset](https://github.com/masakhane-io/masakhane-ner/tree/main/MasakhaNER2.0/data/swa) from the [MasakhaneNER Project](https://github.com/masakhane-io/masakhane-ner). MasakhaNER is a collection of Named Entity Recognition (NER) datasets for 10 different African languages. The languages forming this dataset are: Amharic, Hausa, Igbo, Kinyarwanda, Luganda, Luo, Nigerian-Pidgin, Swahili, Wolof, and Yorùbá. ## Training procedure This model was trained on a single NVIDIA RTX 3090 GPU with recommended hyperparameters from the [original BERT paper](https://arxiv.org/pdf/1810.04805).