File size: 2,592 Bytes
917e919 eb22a1c 917e919 eb22a1c 917e919 eb22a1c 917e919 9d08da9 eb22a1c 917e919 228bcf3 917e919 228bcf3 917e919 228bcf3 917e919 eb22a1c 917e919 228bcf3 9d08da9 917e919 eb22a1c 917e919 eb22a1c 917e919 eb22a1c 917e919 eb22a1c 917e919 eb22a1c 917e919 eb22a1c 917e919 eb22a1c 917e919 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 |
---
language: en
license: apache-2.0
tags:
- audio-classification
- generated_from_trainer
metrics:
- accuracy
- f1
model-index:
- name: distil-wav2vec2-adult-child-cls-52m
results: []
---
# DistilWav2Vec2 Adult/Child Speech Classifier 52M
DistilWav2Vec2 Adult/Child Speech Classifier is an audio classification model based on the [wav2vec 2.0](https://arxiv.org/abs/2006.11477) architecture. This model is a distilled version of [wav2vec2-adult-child-cls](https://huggingface.co/bookbot/wav2vec2-adult-child-cls) on a private adult/child speech classification dataset.
This model was trained using HuggingFace's PyTorch framework. All training was done on a Tesla P100, provided by Kaggle. Training metrics were logged via Tensorboard.
## Model
| Model | #params | Arch. | Training/Validation data (text) |
| ------------------------------------- | ------- | ----------- | ----------------------------------------- |
| `distil-wav2vec2-adult-child-cls-52m` | 52M | wav2vec 2.0 | Adult/Child Speech Classification Dataset |
## Evaluation Results
The model achieves the following results on evaluation:
| Dataset | Loss | Accuracy | F1 |
| --------------------------------- | ------ | -------- | ------ |
| Adult/Child Speech Classification | 0.1301 | 96.03% | 0.9639 |
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- `learning_rate`: 3e-05
- `train_batch_size`: 32
- `eval_batch_size`: 32
- `seed`: 42
- `gradient_accumulation_steps`: 4
- `total_train_batch_size`: 128
- `optimizer`: Adam with `betas=(0.9,0.999)` and `epsilon=1e-08`
- `lr_scheduler_type`: linear
- `lr_scheduler_warmup_ratio`: 0.1
- `num_epochs`: 3
### Training results
| Training Loss | Epoch | Step | Validation Loss | Accuracy | F1 |
| :-----------: | :---: | :--: | :-------------: | :------: | :----: |
| 0.212 | 1.0 | 96 | 0.1561 | 0.9561 | 0.9596 |
| 0.1523 | 2.0 | 192 | 0.1408 | 0.9575 | 0.9616 |
| 0.0844 | 3.0 | 288 | 0.1301 | 0.9603 | 0.9639 |
## Disclaimer
Do consider the biases which came from pre-training datasets that may be carried over into the results of this model.
## Authors
DistilWav2Vec2 Adult/Child Speech Classifier was trained and evaluated by [Wilson Wongso](https://w11wo.github.io/). All computation and development are done on Kaggle.
## Framework versions
- Transformers 4.16.2
- Pytorch 1.10.2+cu102
- Datasets 1.18.3
- Tokenizers 0.10.3
|