File size: 5,267 Bytes
c009c31 f39a039 c009c31 a4b4249 c009c31 a4b4249 c009c31 a4b4249 c009c31 a4b4249 c009c31 ac74139 c009c31 8ca82b5 c009c31 a4b4249 c009c31 985c509 c009c31 a4b4249 c009c31 ac74139 c009c31 ac74139 a4b4249 c009c31 ac74139 a4b4249 ac74139 c009c31 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 |
---
base_model: readerbench/RoBERT-base
language:
- ro
tags:
- hate speech
- offensive language
- romanian
- classification
- nlp
- bert
metrics:
- accuracy
- precision
- recall
- f1_macro
- f1_micro
- f1_weighted
model-index:
- name: ro-offense
results:
- task:
type: text-classification # Required. Example: automatic-speech-recognition
name: Text Classification # Optional. Example: Speech Recognition
dataset:
type: readerbench/ro-offense # Required. Example: common_voice. Use dataset id from https://hf.co/datasets
name: Rommanian Offensive Language Dataset # Required. A pretty name for the dataset. Example: Common Voice (French)
config: default # Optional. The name of the dataset configuration used in `load_dataset()`. Example: fr in `load_dataset("common_voice", "fr")`. See the `datasets` docs for more info: https://huggingface.co/docs/datasets/package_reference/loading_methods#datasets.load_dataset.name
split: test # Optional. Example: test
metrics:
- type: accuracy # Required. Example: wer. Use metric id from https://hf.co/metrics
value: 0.8190 # Required. Example: 20.90
name: Accuracy # Optional. Example: Test WER
- type: precision # Required. Example: wer. Use metric id from https://hf.co/metrics
value: 0.8138 # Required. Example: 20.90
name: Precision # Optional. Example: Test WER
- type: recall # Required. Example: wer. Use metric id from https://hf.co/metrics
value: 0.8118 # Required. Example: 20.90
name: Recall # Optional. Example: Test WER
- type: f1_weighted # Required. Example: wer. Use metric id from https://hf.co/metrics
value: 0.8189 # Required. Example: 20.90
name: Weighted F1 # Optional. Example: Test WER
- type: f1_micro # Required. Example: wer. Use metric id from https://hf.co/metrics
value: 0.8190 # Required. Example: 20.90
name: Macro F1 # Optional. Example: Test WER
- type: f1_macro # Required. Example: wer. Use metric id from https://hf.co/metrics
value: 0.8126 # Required. Example: 20.90
name: Macro F1 # Optional. Example: Test WER
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# RO-Offense
This model is a fine-tuned version of [readerbench/RoBERT-base](https://huggingface.co/readerbench/RoBERT-base) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 0.8411
- Accuracy: 0.8232
- Precision: 0.8235
- Recall: 0.8210
- F1 Macro: 0.8207
- F1 Micro: 0.8232
- F1 Weighted: 0.8210
Output labels:
- LABEL_0 = No offensive language
- LABEL_1 = Profanity (no directed insults)
- LABEL_2 = Insults (directed offensive language, lower level of offensiveness)
- LABEL_3 = Abuse (directed hate speech, racial slurs, sexist speech, threat with violence, death wishes, ..)
## Model description
Finetuned Romanian BERT model for offensive classification.
Trained on the [RO-Offense](https://huggingface.co/datasets/readerbench/ro-offense) Dataset
## Intended uses & limitations
Offensive and Hate speech detection for Romanian Language
## Training and evaluation data
Trained on the train split of [RO-Offense](https://huggingface.co/datasets/readerbench/ro-offense) Dataset
Evaluated on the test split of [RO-Offense](https://huggingface.co/datasets/readerbench/ro-offense) Dataset
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 4e-05
- train_batch_size: 64
- eval_batch_size: 128
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.2
- num_epochs: 10 (Early stop epoch 7, best epoch 4)
### Training results
| Training Loss | Epoch | Step | Validation Loss | Accuracy | Precision | Recall | F1 Macro | F1 Micro | F1 Weighted |
|:-------------:|:-----:|:----:|:---------------:|:--------:|:---------:|:------:|:--------:|:--------:|:-----------:|
| No log | 1.0 | 125 | 0.7789 | 0.7037 | 0.6825 | 0.7000 | 0.6873 | 0.7037 | 0.7132 |
| No log | 2.0 | 250 | 0.5170 | 0.8006 | 0.8066 | 0.8016 | 0.7986 | 0.8006 | 0.7971 |
| No log | 3.0 | 375 | 0.5139 | 0.8096 | 0.8168 | 0.8237 | 0.8120 | 0.8096 | 0.8047 |
| 0.6074 | **4.0** | 500 | 0.6180 | 0.8247 | 0.8251 | 0.8187 | 0.8210 | 0.8247 | **0.8233** |
| 0.6074 | 5.0 | 625 | 0.7311 | 0.8096 | 0.8071 | 0.8085 | 0.8064 | 0.8096 | 0.8071 |
| 0.6074 | 6.0 | 750 | 0.8365 | 0.8101 | 0.8117 | 0.8191 | 0.8105 | 0.8101 | 0.8051 |
| 0.6074 | 7.0 | 875 | 0.8411 | 0.8232 | 0.8235 | 0.8210 | 0.8207 | 0.8232 | 0.8210 |
### Framework versions
- Transformers 4.31.0
- Pytorch 2.0.1+cu118
- Datasets 2.14.3
- Tokenizers 0.13.3
|