iulik-pisik's picture
Update README.md
e5122b4 verified
---
library_name: transformers
datasets:
- iulik-pisik/audio_vreme
- iulik-pisik/horoscop_neti
language:
- ro
metrics:
- accuracy
pipeline_tag: text-classification
base_model: dumitrescustefan/bert-base-romanian-cased-v1
---
## Model Details
### Model Description
This model is a fine-tuned version of the pre-trained Romanian BERT model (bert-base-romanian-cased-v1), specialized for sentiment classification in weather forecasts and horoscope texts. The model is designed to classify texts into two categories: positive and negative.
- **Architecture:** BERT (Bidirectional Encoder Representations from Transformers)
- **Type:** Text Classification
- **Language:** Romanian
- **Base Model:** dumitrescustefan/bert-base-romanian-cased-v1
## Uses
This model is intended for:
1. Automatic sentiment classification in Romanian weather forecast and horoscope texts.
2. Evaluating the effectiveness of Automatic Speech Recognition (ASR) systems in preserving the overall sentiment and meaning of the original text.
3. Applications requiring rapid sentiment analysis in specific domains (meteorology and astrology) without the need for perfect text transcription.
The model is not suitable for:
1. Sentiment classification in domains other than weather forecasts and horoscopes.
2. Detailed analysis of emotional nuances or identification of specific emotions.
3. Use in contexts requiring extremely high transcription accuracy.
## Training Details
### Training Data
The model was trained using two datasets:
- `iulik-pisik/audio_vreme`: Transcriptions of weather forecasts
- `iulik-pisik/horoscop_neti`: Transcriptions of horoscopes
The training data was automatically labeled using the OpenAI GPT-3.5 Turbo API. Neutral texts were excluded from the training set to focus on clear positive/negative distinctions.
## Evaluation
### Testing Data, Factors & Metrics
#### Testing Data
The model was evaluated on:
1. A subset of manual annotations from the training datasets.
2. Transcriptions generated by various custom Whisper models for Romanian ASR.
#### Metrics
The primary metric used for evaluation is accuracy.
### Results
- Overall accuracy on annotations: 0.9137
- Accuracy for weather texts: 0.9189
- Accuracy for horoscope texts: 0.8964
The model also demonstrated comparable performance on ASR transcriptions from the best-performing custom Whisper model, albeit slightly lower than on manual annotations.