|
--- |
|
library_name: transformers |
|
datasets: |
|
- iulik-pisik/audio_vreme |
|
- iulik-pisik/horoscop_neti |
|
language: |
|
- ro |
|
metrics: |
|
- accuracy |
|
pipeline_tag: text-classification |
|
base_model: dumitrescustefan/bert-base-romanian-cased-v1 |
|
--- |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
This model is a fine-tuned version of the pre-trained Romanian BERT model (bert-base-romanian-cased-v1), specialized for sentiment classification in weather forecasts and horoscope texts. The model is designed to classify texts into two categories: positive and negative. |
|
|
|
- **Architecture:** BERT (Bidirectional Encoder Representations from Transformers) |
|
- **Type:** Text Classification |
|
- **Language:** Romanian |
|
- **Base Model:** dumitrescustefan/bert-base-romanian-cased-v1 |
|
|
|
## Uses |
|
|
|
This model is intended for: |
|
|
|
1. Automatic sentiment classification in Romanian weather forecast and horoscope texts. |
|
2. Evaluating the effectiveness of Automatic Speech Recognition (ASR) systems in preserving the overall sentiment and meaning of the original text. |
|
3. Applications requiring rapid sentiment analysis in specific domains (meteorology and astrology) without the need for perfect text transcription. |
|
|
|
The model is not suitable for: |
|
|
|
1. Sentiment classification in domains other than weather forecasts and horoscopes. |
|
2. Detailed analysis of emotional nuances or identification of specific emotions. |
|
3. Use in contexts requiring extremely high transcription accuracy. |
|
|
|
## Training Details |
|
|
|
### Training Data |
|
|
|
The model was trained using two datasets: |
|
- `iulik-pisik/audio_vreme`: Transcriptions of weather forecasts |
|
- `iulik-pisik/horoscop_neti`: Transcriptions of horoscopes |
|
|
|
The training data was automatically labeled using the OpenAI GPT-3.5 Turbo API. Neutral texts were excluded from the training set to focus on clear positive/negative distinctions. |
|
|
|
## Evaluation |
|
|
|
### Testing Data, Factors & Metrics |
|
|
|
#### Testing Data |
|
|
|
The model was evaluated on: |
|
1. A subset of manual annotations from the training datasets. |
|
2. Transcriptions generated by various custom Whisper models for Romanian ASR. |
|
|
|
#### Metrics |
|
|
|
The primary metric used for evaluation is accuracy. |
|
|
|
### Results |
|
|
|
- Overall accuracy on annotations: 0.9137 |
|
- Accuracy for weather texts: 0.9189 |
|
- Accuracy for horoscope texts: 0.8964 |
|
|
|
The model also demonstrated comparable performance on ASR transcriptions from the best-performing custom Whisper model, albeit slightly lower than on manual annotations. |
|
|
|
|