File size: 5,117 Bytes
f2ed383 0a31060 53ec710 f2ed383 b407a10 f9d2b6d b407a10 f9d2b6d b407a10 f9d2b6d b407a10 f9d2b6d 53ec710 172f994 c782dca 172f994 c782dca 172f994 c782dca 172f994 f2ed383 025bedc f2ed383 44d67e8 f9d2b6d f2ed383 b9e9a81 4c0d7d2 f2ed383 b9e9a81 f2ed383 b9e9a81 025bedc b9e9a81 f2ed383 7ab4387 2c4c435 f2ed383 1dc3dc9 f9d2b6d 90d9ba6 f2ed383 53ec710 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 |
---
license: mit
base_model: xlm-roberta-base
tags:
- silvanus
metrics:
- precision
- recall
- f1
- accuracy
model-index:
- name: xlm-roberta-base-ner-silvanus
results:
- task:
name: Token Classification
type: token-classification
dataset:
name: id_nergrit_corpus
type: id_nergrit_corpus
config: ner
split: validation
args: ner
metrics:
- name: Precision
type: precision
value: 0.918918918918919
- name: Recall
type: recall
value: 0.9272727272727272
- name: F1
type: f1
value: 0.9230769230769231
- name: Accuracy
type: accuracy
value: 0.9858518778229216
language:
- id
- en
- es
- it
- sk
pipeline_tag: token-classification
widget:
- text: >-
Kebakaran hutan dan lahan terus terjadi dan semakin meluas di Kota
Palangkaraya, Kalimantan Tengah (Kalteng) pada hari Rabu, 15 Nopember 2023
20.00 WIB. Bahkan kobaran api mulai membakar pondok warga dan mendekati
permukiman. BZK #RCTINews #SeputariNews #News #Karhutla #KebakaranHutan
#HutanKalimantan #SILVANUS_Italian_Pilot_Testing
example_title: Indonesia
- text: >-
Wildfire rages for a second day in Evia destroying a Natura 2000 protected
pine forest. - 5:51 PM Aug 14, 2019
example_title: English
- text: >-
3 nov 2023 21:57 - Incendio forestal obliga a la evacuación de hasta 850
personas cerca del pueblo de Montichelvo en Valencia.
example_title: Spanish
- text: >-
Incendi boschivi nell'est del Paese: 2 morti e oltre 50 case distrutte nello
stato del Queensland.
example_title: Italian
- text: >-
Lesné požiare na Sicílii si vyžiadali dva ľudské životy a evakuáciu hotela
http://dlvr.it/SwW3sC - 23. septembra 2023 20:57
example_title: Slovak
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# xlm-roberta-base-ner-silvanus
This model is a fine-tuned version of [xlm-roberta-base](https://huggingface.co/xlm-roberta-base) on the Indonesian NER dataset.
It achieves the following results on the evaluation set:
- Loss: 0.0567
- Precision: 0.9189
- Recall: 0.9273
- F1: 0.9231
- Accuracy: 0.9859
## Model description
The XLM-RoBERTa model was proposed in [Unsupervised Cross-lingual Representation Learning at Scale](https://arxiv.org/abs/1911.02116) by Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov. It is based on Facebook's RoBERTa model released in 2019. It is a large multi-lingual language model, trained on 2.5TB of filtered CommonCrawl data.
- **Developed by:** See [associated paper](https://arxiv.org/abs/1911.02116)
- **Model type:** Multi-lingual model
- **Language(s) (NLP) or Countries (images):** XLM-RoBERTa is a multilingual model trained on 100 different languages; see [GitHub Repo](https://github.com/facebookresearch/fairseq/tree/main/examples/xlmr) for full list; model is fine-tuned on a dataset in English
- **License:** More information needed
- **Related Models:** [RoBERTa](https://huggingface.co/roberta-base), [XLM](https://huggingface.co/docs/transformers/model_doc/xlm)
- **Parent Model:** [XLM-RoBERTa](https://huggingface.co/xlm-roberta-base)
- **Resources for more information:** [GitHub Repo](https://github.com/facebookresearch/fairseq/tree/main/examples/xlmr)
## Intended uses & limitations
This model can be used to extract multilingual information such as location, date and time on social media (Twitter, etc.). This model is limited by an Indonesian language training data set to be tested in 4 languages (English, Spanish, Italian and Slovak) using zero-shot transfer learning techniques to extract multilingual information.
## Training and evaluation data
This model was fine-tuned on Indonesian NER datasets.
Abbreviation|Description
-|-
O|Outside of a named entity
B-LOC |Beginning of a location right after another location
I-LOC |Location
B-DAT |Beginning of a date right after another date
I-DAT |Date
B-TIM |Beginning of a time right after another time
I-TIM |Time
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 3
### Training results
| Training Loss | Epoch | Step | Validation Loss | Precision | Recall | F1 | Accuracy |
|:-------------:|:-----:|:----:|:---------------:|:---------:|:------:|:------:|:--------:|
| 0.1394 | 1.0 | 827 | 0.0559 | 0.8808 | 0.9257 | 0.9027 | 0.9842 |
| 0.0468 | 2.0 | 1654 | 0.0575 | 0.9107 | 0.9190 | 0.9148 | 0.9849 |
| 0.0279 | 3.0 | 2481 | 0.0567 | 0.9189 | 0.9273 | 0.9231 | 0.9859 |
### Framework versions
- Transformers 4.35.0
- Pytorch 2.1.0+cu118
- Datasets 2.14.6
- Tokenizers 0.14.1 |