README.md · iulik-pisik/romanian-bert-weather-horoscope at main

metadata

library_name: transformers
datasets:
  - iulik-pisik/audio_vreme
  - iulik-pisik/horoscop_neti
language:
  - ro
metrics:
  - accuracy
pipeline_tag: text-classification
base_model: dumitrescustefan/bert-base-romanian-cased-v1

Model Details

Model Description

This model is a fine-tuned version of the pre-trained Romanian BERT model (bert-base-romanian-cased-v1), specialized for sentiment classification in weather forecasts and horoscope texts. The model is designed to classify texts into two categories: positive and negative.

Architecture: BERT (Bidirectional Encoder Representations from Transformers)
Type: Text Classification
Language: Romanian
Base Model: dumitrescustefan/bert-base-romanian-cased-v1

Uses

This model is intended for:

Automatic sentiment classification in Romanian weather forecast and horoscope texts.
Evaluating the effectiveness of Automatic Speech Recognition (ASR) systems in preserving the overall sentiment and meaning of the original text.
Applications requiring rapid sentiment analysis in specific domains (meteorology and astrology) without the need for perfect text transcription.

The model is not suitable for:

Sentiment classification in domains other than weather forecasts and horoscopes.
Detailed analysis of emotional nuances or identification of specific emotions.
Use in contexts requiring extremely high transcription accuracy.

Training Details

Training Data

The model was trained using two datasets:

iulik-pisik/audio_vreme: Transcriptions of weather forecasts
iulik-pisik/horoscop_neti: Transcriptions of horoscopes

The training data was automatically labeled using the OpenAI GPT-3.5 Turbo API. Neutral texts were excluded from the training set to focus on clear positive/negative distinctions.

Evaluation

Testing Data, Factors & Metrics

Testing Data

The model was evaluated on:

A subset of manual annotations from the training datasets.
Transcriptions generated by various custom Whisper models for Romanian ASR.

Metrics

The primary metric used for evaluation is accuracy.

Results

Overall accuracy on annotations: 0.9137
Accuracy for weather texts: 0.9189
Accuracy for horoscope texts: 0.8964

The model also demonstrated comparable performance on ASR transcriptions from the best-performing custom Whisper model, albeit slightly lower than on manual annotations.