|
--- |
|
language: |
|
- 'no' |
|
- nb |
|
- nn |
|
license: cc-by-4.0 |
|
pipeline_tag: token-classification |
|
--- |
|
# Targeted Sentiment Analysis model for Norwegian text |
|
This model is a fine-tuned version of [ltg/norbert3-large](https://huggingface.co/ltg/norbert3-large) For Targeted Sentiment Analysis (TSA) on Norwegian text. The fine-tuning script is avaiable [on github](https://github.com/egilron/seq-label.git). |
|
In TSA, we identify sentiment targets, "That what is spoken positively or negatively about" in each sentence. Our models performs the task through sequence labeling, AKA "token classification". |
|
|
|
The dataset used for fine-tuning is [ltg/norec_tsa](https://huggingface.co/datasets/ltg/norec_tsa), at its defaul settings, were sentiment targets are labeled as either "targ-Positive" or "targ-Negative". The norec_tsa dataset is derived from the [NoReC_fine dataset](https://github.com/ltgoslo/norec_fine). |
|
|
|
|
|
## Quick start |
|
You can use this model in your scripts as follows: |
|
```>>> from transformers import pipeline |
|
>>> origin = "ltg/norbert3-large_TSA" |
|
>>> trust_remote = "norbert3" in origin.lower() |
|
>>> text = "Hans hese , litt såre stemme kler bluesen , men denne platen kommer neppe til å bli blant hans største kommersielle suksesser ." |
|
>>> if trust_remote: # Downloads configurations for norbert3 |
|
... pipe = transformers.pipeline( "token-classification", |
|
... aggregation_strategy='first', |
|
... model = origin, |
|
... trust_remote_code=trust_remote, |
|
... tokenizer = AutoTokenizer.from_pretrained(origin) |
|
... ) |
|
... preds = pipe(text) |
|
... for p in preds: |
|
... print(p) |
|
|
|
{'entity_group': 'targ-Positive', 'score': 0.6990814, 'word': ' Hans hese , litt såre stemme', 'start': 0, 'end': 28} |
|
{'entity_group': 'targ-Negative', 'score': 0.5721016, 'word': ' platen', 'start': 53, 'end': 60} |
|
``` |
|
|
|
|
|
|
|
## Training hyperparameters |
|
- per_device_train_batch_size: 64 |
|
- per_device_eval_batch_size: 8 |
|
- learning_rate: 1e-05 |
|
- gradient_accumulation_steps: 1 |
|
- num_train_epochs: 24 (best epoch 18) |
|
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 |
|
|
|
## Evaluation |
|
``` precision recall f1-score support |
|
|
|
targ-Negative 0.4648 0.3143 0.3750 210 |
|
targ-Positive 0.5097 0.6019 0.5520 525 |
|
|
|
micro avg 0.5013 0.5197 0.5104 735 |
|
macro avg 0.4872 0.4581 0.4635 735 |
|
weighted avg 0.4969 0.5197 0.5014 735 |
|
``` |