iulik-pisik
/

romanian-bert-weather-horoscope

Text Classification

Inference Endpoints

Model card Files Files and versions Community

romanian-bert-weather-horoscope / README.md

iulik-pisik's picture

Update README.md

e5122b4 verified 4 months ago

|

history blame contribute delete

2.43 kB

	---
	library_name: transformers
	datasets:
	- iulik-pisik/audio_vreme
	- iulik-pisik/horoscop_neti
	language:
	- ro
	metrics:
	- accuracy
	pipeline_tag: text-classification
	base_model: dumitrescustefan/bert-base-romanian-cased-v1
	---

	## Model Details

	### Model Description

	This model is a fine-tuned version of the pre-trained Romanian BERT model (bert-base-romanian-cased-v1), specialized for sentiment classification in weather forecasts and horoscope texts. The model is designed to classify texts into two categories: positive and negative.

	- Architecture: BERT (Bidirectional Encoder Representations from Transformers)
	- Type: Text Classification
	- Language: Romanian
	- Base Model: dumitrescustefan/bert-base-romanian-cased-v1

	## Uses

	This model is intended for:

	1. Automatic sentiment classification in Romanian weather forecast and horoscope texts.
	2. Evaluating the effectiveness of Automatic Speech Recognition (ASR) systems in preserving the overall sentiment and meaning of the original text.
	3. Applications requiring rapid sentiment analysis in specific domains (meteorology and astrology) without the need for perfect text transcription.

	The model is not suitable for:

	1. Sentiment classification in domains other than weather forecasts and horoscopes.
	2. Detailed analysis of emotional nuances or identification of specific emotions.
	3. Use in contexts requiring extremely high transcription accuracy.

	## Training Details

	### Training Data

	The model was trained using two datasets:
	- `iulik-pisik/audio_vreme`: Transcriptions of weather forecasts
	- `iulik-pisik/horoscop_neti`: Transcriptions of horoscopes

	The training data was automatically labeled using the OpenAI GPT-3.5 Turbo API. Neutral texts were excluded from the training set to focus on clear positive/negative distinctions.

	## Evaluation

	### Testing Data, Factors & Metrics

	#### Testing Data

	The model was evaluated on:
	1. A subset of manual annotations from the training datasets.
	2. Transcriptions generated by various custom Whisper models for Romanian ASR.

	#### Metrics

	The primary metric used for evaluation is accuracy.

	### Results

	- Overall accuracy on annotations: 0.9137
	- Accuracy for weather texts: 0.9189
	- Accuracy for horoscope texts: 0.8964

	The model also demonstrated comparable performance on ASR transcriptions from the best-performing custom Whisper model, albeit slightly lower than on manual annotations.