tielectra-small-pos / README.md
fgaim's picture
Update README
2376b2e
metadata
language: ti
widget:
  - text: ድምጻዊ ኣብርሃም ኣፈወርቂ ንዘልኣለም ህያው ኮይኑ ኣብ ልብና ይነብር
datasets:
  - TLMD
  - NTC
metrics:
  - f1
  - precision
  - recall
  - accuracy
model-index:
  - name: tielectra-small-pos
    results:
      - task:
          name: Token Classification
          type: token-classification
        metrics:
          - name: F1
            type: f1
            value: 0.9456
          - name: Precision
            type: precision
            value: 0.9456
          - name: Recall
            type: recall
            value: 0.9456
          - name: Accuracy
            type: accuracy
            value: 0.9456

Tigrinya POS tagging with TiELECTRA

This model is a fine-tuned version of TiELECTRA on the NTC-v1 dataset (Tedla et al. 2016).

Basic usage

from transformers import pipeline

ti_pos = pipeline("token-classification", model="fgaim/tielectra-small-pos")
ti_pos("ድምጻዊ ኣብርሃም ኣፈወርቂ ንዘልኣለም ህያው ኮይኑ ኣብ ልብና ይነብር")

Training

Hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 32
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 10.0

Results

The model achieves the following results on the test set:

  • Loss: 0.2236
  • Adj Precision: 0.9148
  • Adj Recall: 0.9192
  • Adj F1: 0.9170
  • Adj Number: 1670
  • Adv Precision: 0.8228
  • Adv Recall: 0.8058
  • Adv F1: 0.8142
  • Adv Number: 484
  • Con Precision: 0.9793
  • Con Recall: 0.9743
  • Con F1: 0.9768
  • Con Number: 972
  • Fw Precision: 0.5
  • Fw Recall: 0.3214
  • Fw F1: 0.3913
  • Fw Number: 28
  • Int Precision: 0.64
  • Int Recall: 0.6154
  • Int F1: 0.6275
  • Int Number: 26
  • N Precision: 0.9525
  • N Recall: 0.9587
  • N F1: 0.9556
  • N Number: 3992
  • Num Precision: 0.9825
  • Num Recall: 0.9372
  • Num F1: 0.9593
  • Num Number: 239
  • N Prp Precision: 0.9132
  • N Prp Recall: 0.9404
  • N Prp F1: 0.9266
  • N Prp Number: 470
  • N V Precision: 0.9667
  • N V Recall: 0.9760
  • N V F1: 0.9713
  • N V Number: 416
  • Pre Precision: 0.9645
  • Pre Recall: 0.9592
  • Pre F1: 0.9619
  • Pre Number: 907
  • Pro Precision: 0.9395
  • Pro Recall: 0.9079
  • Pro F1: 0.9234
  • Pro Number: 445
  • Pun Precision: 1.0
  • Pun Recall: 0.9994
  • Pun F1: 0.9997
  • Pun Number: 1607
  • Unc Precision: 0.9286
  • Unc Recall: 0.8125
  • Unc F1: 0.8667
  • Unc Number: 16
  • V Precision: 0.7609
  • V Recall: 0.8974
  • V F1: 0.8235
  • V Number: 78
  • V Aux Precision: 0.9581
  • V Aux Recall: 0.9786
  • V Aux F1: 0.9682
  • V Aux Number: 654
  • V Ger Precision: 0.9183
  • V Ger Recall: 0.9415
  • V Ger F1: 0.9297
  • V Ger Number: 513
  • V Imf Precision: 0.9473
  • V Imf Recall: 0.9442
  • V Imf F1: 0.9458
  • V Imf Number: 914
  • V Imv Precision: 0.8163
  • V Imv Recall: 0.5714
  • V Imv F1: 0.6723
  • V Imv Number: 70
  • V Prf Precision: 0.8927
  • V Prf Recall: 0.8776
  • V Prf F1: 0.8851
  • V Prf Number: 294
  • V Rel Precision: 0.9535
  • V Rel Recall: 0.9485
  • V Rel F1: 0.9510
  • V Rel Number: 757
  • Overall Precision: 0.9456
  • Overall Recall: 0.9456
  • Overall F1: 0.9456
  • Overall Accuracy: 0.9456

Framework versions

  • Transformers 4.10.3
  • Pytorch 1.9.0+cu111
  • Datasets 1.10.2
  • Tokenizers 0.10.1

Citation

If you use this model in your product or research, please cite as follows:

@article{Fitsum2021TiPLMs,
  author= {Fitsum Gaim and Wonsuk Yang and Jong C. Park},
  title= {Monolingual Pre-trained Language Models for Tigrinya},
  year= 2021,
  publisher= {WiNLP 2021/EMNLP 2021}
}

References

Tedla, Y., Yamamoto, K. & Marasinghe, A. 2016.
Tigrinya Part-of-Speech Tagging with Morphological Patterns and the New Nagaoka Tigrinya Corpus.
International Journal Of Computer Applications 146 pp. 33-41 (2016).