README.md · hakonmh/sentiment-xdistil-uncased at main

metadata

license: mit
language:
  - en
pipeline_tag: text-classification
tags:
  - finance
  - financial-sentiment-analysis
  - sentiment-analysis
library_name: transformers
widget:
  - text: unemployment hits record low as job opportunities soar
  - text: unemployment hits record high as job opportunities suffers

Sentiment-xDistil is a model based on xtremedistil-l12-h384-uncased fine-tuned for classifying the sentiment of news headlines on a dataset annotated by Chat GPT 3.5. It is built, together with Topic-xDistil, as a tool for filtering out financial news headlines and classifying their sentiment. The code used to train both models and build the dataset are found here.

Notes: The output labels are either Negative, Neutral, or Positive. The model is suitable for English.

Performance Results

Here are the performance metrics for both models on the test set:

Model	Test Set Size	Accuracy	F1 Score
`topic-xdistil-uncased`	32 799	94.44 %	92.59 %
`sentiment-xdistil-uncased`	17 527	94.59 %	93.44 %

Data

The training data consists of 300k+ news headlines and tweets, and was annotated by Chat GPT 3.5, which has shown to outperform crowd-workers for text annotation tasks.

The sentence labels are defined by the Chat GPT prompt as follows:

"""
[...]
Does the headline convey a Positive, Neutral, or Negative sentiment with \
regard to the current state or potential future impact on the economy or \
the asset described?
    - Positive sentiment headlines suggest growth, improvement, or \
stability in economic conditions.
    - Neutral sentiment headlines do not clearly indicate a positive or \
negative impact on the economy.
    - Negative sentiment headlines imply economic decline, uncertainty, \
or unfavorable conditions.
[...]
"""

Example Usage

Here's a simple example:

from transformers import AutoTokenizer, AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained("hakonmh/sentiment-xdistil-uncased")
tokenizer = AutoTokenizer.from_pretrained("hakonmh/sentiment-xdistil-uncased")

SENTENCE = "Global Growth Surges as New Technologies Drive Innovation and Productivity!"
inputs = tokenizer(SENTENCE, return_tensors="pt")
output = model(**inputs).logits
predicted_label = model.config.id2label[output.argmax(-1).item()]

print(predicted_label)

Positive

Or, as a pipeline together with Topic-xDistil:

from transformers import pipeline

topic_classifier = pipeline("sentiment-analysis",
                            model="hakonmh/topic-xdistil-uncased",
                            tokenizer="hakonmh/topic-xdistil-uncased")
sentiment_classifier = pipeline("sentiment-analysis",
                                model="hakonmh/sentiment-xdistil-uncased",
                                tokenizer="hakonmh/sentiment-xdistil-uncased")

SENTENCE = "Global Growth Surges as New Technologies Drive Innovation and Productivity!"
print(topic_classifier(SENTENCE))
print(sentiment_classifier(SENTENCE))

[{'label': 'Economics', 'score': 0.9970171451568604}]
[{'label': 'Positive', 'score': 0.9997037053108215}]

Tested on transformers 4.30.1, and torch 2.0.0.