File size: 1,894 Bytes
49908fc 0ebc061 49908fc 0ebc061 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 |
---
language:
- sw
license: apache-2.0
datasets:
- masakhaner
pipeline_tag: token-classification
examples: null
widget:
- text: Joe Bidden ni rais wa marekani.
example_title: Sentence 1
- text: Tumefanya mabadiliko muhimu katika sera zetu za faragha na vidakuzi.
example_title: Sentence 2
- text: Mtoto anaweza kupoteza muda kabisa.
example_title: Sentence 3
metrics:
- accuracy
---
# Swahili Named Entity Recognition
- **TUS-NER-sw** is a fine-tuned BERT model that is ready to use for **Named Entity Recognition** and achieves **state-of-the-art performance 😀**
- Finetuned from model: [eolang/SW-v1](https://huggingface.co/eolang/SW-v1)
## Intended uses & limitations
#### How to use
You can use this model with Transformers *pipeline* for NER.
```python
from transformers import pipeline
from transformers import AutoTokenizer, AutoModelForTokenClassification
tokenizer = AutoTokenizer.from_pretrained("eolang/SW-NER-v1")
model = AutoModelForTokenClassification.from_pretrained("eolang/SW-NER-v1")
nlp = pipeline("ner", model=model, tokenizer=tokenizer)
example = "Tumefanya mabadiliko muhimu katika sera zetu za faragha na vidakuzi"
ner_results = nlp(example)
print(ner_results)
```
## Training data
This model was fine-tuned on the Swahili Version of the [Masakhane Dataset](https://github.com/masakhane-io/masakhane-ner/tree/main/MasakhaNER2.0/data/swa) from the [MasakhaneNER Project](https://github.com/masakhane-io/masakhane-ner).
MasakhaNER is a collection of Named Entity Recognition (NER) datasets for 10 different African languages.
The languages forming this dataset are: Amharic, Hausa, Igbo, Kinyarwanda, Luganda, Luo, Nigerian-Pidgin, Swahili, Wolof, and Yorùbá.
## Training procedure
This model was trained on a single NVIDIA RTX 3090 GPU with recommended hyperparameters from the [original BERT paper](https://arxiv.org/pdf/1810.04805). |