File size: 1,577 Bytes
1fb0cc2 41230d6 d89a5b6 1fb0cc2 9d79fa0 1fb0cc2 9d79fa0 1fb0cc2 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 |
---
license: apache-2.0
language:
- en
base_model:
- ntu-spml/distilhubert
pipeline_tag: audio-classification
library_name: flair
---
# Emotion Detection From Speech
This model is the fine-tuned version of **DistilHuBERT** which classifies emotions from audio inputs.
## Approach
1. **Dataset:** The **Ravdess** dataset, comprising 1,440 audio files with 8 emotion labels: calm, happy, sad, angry, fearful, surprise, neutral, and disgust.
2. **Model Fine-Tuning:** The DistilHuBERT model was fine-tuned for 7 epochs with a learning rate of 5e-5, achieving an accuracy of 98% on the test dataset.
## Data Preprocessing
- **Sampling Rate**: Audio files were resampled to 16kHz to match the model's requirements.
- **Padding:** Audio clips shorter than 30 seconds were zero-padded.
- **Train-Test Split:** 80% of the samples were used for training, and 20% for testing.
## Model Architecture
- **DistilHuBERT:** A lightweight variant of HuBERT, fine-tuned for emotion classification.
- **Fine-Tuning Setup:**
- Optimizer: AdamW
- Loss Function: Cross-Entropy
- Learning Rate: 5e-5
- Warm-up Ratio: 0.1
- Epochs: 7
## Results
- **Accuracy:** 0.98 on the test dataset
- **Loss:** 0.10 on the test dataset
## Usage
```bash
from transformers import pipeline
pipe = pipeline(
"audio-classification",
model="BilalHasan/distilhubert-finetuned-ravdess",
)
emotion = pipe(path_to_your_audio)
```
## Demo
You can access the live demo of the app on [Hugging Face Spaces](https://huggingface.co/spaces/BilalHasan/Mood-Based-Yoga-Session-Recommendation). |