metadata
language:
- 'no'
- nb
- nn
license: cc-by-4.0
pipeline_tag: token-classification
Targeted Sentiment Analysis model for Norwegian text
This model is a fine-tuned version of ltg/norbert3-large For Targeted Sentiment Analysis (TSA) on Norwegian text. The fine-tuning script is avaiable on github.
In TSA, we identify sentiment targets, "That what is spoken positively or negatively about" in each sentence. Our models performs the task through sequence labeling, AKA "token classification".
The dataset used for fine-tuning is ltg/norec_tsa, at its defaul settings, were sentiment targets are labeled as either "targ-Positive" or "targ-Negative". The norec_tsa dataset is derived from the NoReC_fine dataset.
Quick start
You can use this model in your scripts as follows:
>>> origin = "ltg/norbert3-large_TSA"
>>> trust_remote = "norbert3" in origin.lower()
>>> text = "Hans hese , litt såre stemme kler bluesen , men denne platen kommer neppe til å bli blant hans største kommersielle suksesser ."
>>> if trust_remote: # Downloads configurations for norbert3
... pipe = transformers.pipeline( "token-classification",
... aggregation_strategy='first',
... model = origin,
... trust_remote_code=trust_remote,
... tokenizer = AutoTokenizer.from_pretrained(origin)
... )
... preds = pipe(text)
... for p in preds:
... print(p)
{'entity_group': 'targ-Positive', 'score': 0.6990814, 'word': ' Hans hese , litt såre stemme', 'start': 0, 'end': 28}
{'entity_group': 'targ-Negative', 'score': 0.5721016, 'word': ' platen', 'start': 53, 'end': 60}
Training hyperparameters
- per_device_train_batch_size: 64
- per_device_eval_batch_size: 8
- learning_rate: 1e-05
- gradient_accumulation_steps: 1
- num_train_epochs: 24 (best epoch 18)
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
Evaluation
targ-Negative 0.4648 0.3143 0.3750 210
targ-Positive 0.5097 0.6019 0.5520 525
micro avg 0.5013 0.5197 0.5104 735
macro avg 0.4872 0.4581 0.4635 735
weighted avg 0.4969 0.5197 0.5014 735