Amharic Tokenizer
Model Details
- Vocabulary Size: 100,000
- Tokenizer Type: Byte-Pair Encoder
Model Description
- Developed by: Biniyam Ajaw
- Language(s) (NLP): Amharic and Amharic-Driven Languages
- License: MIT
Model Sources [optional]
Uses
Model can be called by the autotokenizer module from the transformers package and can be used to tokenize any amharic text perfectly