File size: 1,577 Bytes

---
license: apache-2.0
language:
- en
base_model:
- ntu-spml/distilhubert
pipeline_tag: audio-classification
library_name: flair
---
# Emotion Detection From Speech

This model is the fine-tuned version of **DistilHuBERT** which classifies emotions from audio inputs.

## Approach
1. **Dataset:** The **Ravdess** dataset, comprising 1,440 audio files with 8 emotion labels: calm, happy, sad, angry, fearful, surprise, neutral, and disgust.
2. **Model Fine-Tuning:** The DistilHuBERT model was fine-tuned for 7 epochs with a learning rate of 5e-5, achieving an accuracy of 98% on the test dataset.

## Data Preprocessing
- **Sampling Rate**: Audio files were resampled to 16kHz to match the model's requirements.
- **Padding:** Audio clips shorter than 30 seconds were zero-padded.
- **Train-Test Split:** 80% of the samples were used for training, and 20% for testing.

##  Model Architecture
- **DistilHuBERT:** A lightweight variant of HuBERT, fine-tuned for emotion classification.
- **Fine-Tuning Setup:**
    - Optimizer: AdamW
    - Loss Function: Cross-Entropy
    - Learning Rate: 5e-5
    - Warm-up Ratio: 0.1
    - Epochs: 7
 
## Results
- **Accuracy:** 0.98 on the test dataset
- **Loss:** 0.10 on the test dataset

## Usage
```bash
from transformers import pipeline

pipe = pipeline(
    "audio-classification",
    model="BilalHasan/distilhubert-finetuned-ravdess",
)

emotion = pipe(path_to_your_audio)
```

## Demo
You can access the live demo of the app on [Hugging Face Spaces](https://huggingface.co/spaces/BilalHasan/Mood-Based-Yoga-Session-Recommendation).