Amharic BPE Tokenizer
This repo contains a Byte-Pair Encoding tokenizer trained on the Amharic subset of the oscar dataset. It's the same as the GPT-2 tokenizer but trained from scratch on an amharic dataset with a vocabulary size of 24000
.
How to use
You can load the tokenizer from huggingface hub as follows.
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("rasyosef/gpt2-oscar-amharic-tokenizer")
tokenizer("α α£αα α«αα¨ α¨ααα α²α¬α΅ α₯α½ααααα’")
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no pipeline_tag.