rollerhafeezh-amikom's picture
Update README.md
7ab4387 verified
|
raw
history blame
5.12 kB
metadata
license: mit
base_model: xlm-roberta-base
tags:
  - silvanus
metrics:
  - precision
  - recall
  - f1
  - accuracy
model-index:
  - name: xlm-roberta-base-ner-silvanus
    results:
      - task:
          name: Token Classification
          type: token-classification
        dataset:
          name: id_nergrit_corpus
          type: id_nergrit_corpus
          config: ner
          split: validation
          args: ner
        metrics:
          - name: Precision
            type: precision
            value: 0.918918918918919
          - name: Recall
            type: recall
            value: 0.9272727272727272
          - name: F1
            type: f1
            value: 0.9230769230769231
          - name: Accuracy
            type: accuracy
            value: 0.9858518778229216
language:
  - id
  - en
  - es
  - it
  - sk
pipeline_tag: token-classification
widget:
  - text: >-
      Kebakaran hutan dan lahan terus terjadi dan semakin meluas di Kota
      Palangkaraya, Kalimantan Tengah (Kalteng) pada hari Rabu, 15 Nopember 2023
      20.00 WIB. Bahkan kobaran api mulai membakar pondok warga dan mendekati
      permukiman. BZK #RCTINews #SeputariNews #News #Karhutla #KebakaranHutan
      #HutanKalimantan #SILVANUS_Italian_Pilot_Testing
    example_title: Indonesia
  - text: >-
      Wildfire rages for a second day in Evia destroying a Natura 2000 protected
      pine forest. - 5:51 PM Aug 14, 2019
    example_title: English
  - text: >-
      3 nov 2023 21:57 - Incendio forestal obliga a la evacuación de hasta 850
      personas cerca del pueblo de Montichelvo en Valencia.
    example_title: Spanish
  - text: >-
      Incendi boschivi nell'est del Paese: 2 morti e oltre 50 case distrutte
      nello stato del Queensland.
    example_title: Italian
  - text: >-
      Lesné požiare na Sicílii si vyžiadali dva ľudské životy a evakuáciu hotela
      http://dlvr.it/SwW3sC - 23. septembra 2023 20:57
    example_title: Slovak

xlm-roberta-base-ner-silvanus

This model is a fine-tuned version of xlm-roberta-base on the Indonesian NER dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0595
  • Precision: 0.9189
  • Recall: 0.9273
  • F1: 0.9231
  • Accuracy: 0.9859

Model description

The XLM-RoBERTa model was proposed in Unsupervised Cross-lingual Representation Learning at Scale by Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov. It is based on Facebook's RoBERTa model released in 2019. It is a large multi-lingual language model, trained on 2.5TB of filtered CommonCrawl data.

  • Developed by: See associated paper
  • Model type: Multi-lingual model
  • Language(s) (NLP) or Countries (images): XLM-RoBERTa is a multilingual model trained on 100 different languages; see GitHub Repo for full list; model is fine-tuned on a dataset in English
  • License: More information needed
  • Related Models: RoBERTa, XLM
  • Resources for more information: GitHub Repo

Intended uses & limitations

This model can be used to extract multilingual information such as location, date and time on social media (Twitter, etc.). This model is limited by an Indonesian language training data set to be tested in 4 languages (English, Spanish, Italian and Slovak) using zero-shot transfer learning techniques to extract multilingual information.

Training and evaluation data

This model was fine-tuned on Indonesian NER datasets.

Abbreviation Description
O Outside of a named entity
B-LOC Beginning of a location right after another location
I-LOC Location
B-DAT Beginning of a date right after another date
I-DAT Date
B-TIM Beginning of a time right after another time
I-TIM Time

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Precision Recall F1 Accuracy
0.1394 1.0 827 0.0559 0.8808 0.9257 0.9027 0.9842
0.0468 2.0 1654 0.0575 0.9107 0.9190 0.9148 0.9849
0.0279 3.0 2481 0.0595 0.9189 0.9273 0.9231 0.9859

Framework versions

  • Transformers 4.35.0
  • Pytorch 2.1.0+cu118
  • Datasets 2.14.6
  • Tokenizers 0.14.1