bookbot
/

distil-ast-audioset

@@ -1,75 +1,77 @@
 ---
-license: bsd-3-clause
 tags:
-- audio-classification
-- generated_from_trainer
 metrics:
-- f1
-- accuracy
-model-index:
-- name: distil-ast-audioset-2
-  results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-# distil-ast-audioset-2
-This model is a fine-tuned version of [MIT/ast-finetuned-audioset-10-10-0.4593](https://huggingface.co/MIT/ast-finetuned-audioset-10-10-0.4593) on the bookbot/audioset dataset.
-It achieves the following results on the evaluation set:
-- Loss: 0.3063
-- F1: 0.4876
-- Roc Auc: 0.7140
-- Accuracy: 0.0714
-- Map: 0.4743
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
 ## Training procedure
 ### Training hyperparameters
 The following hyperparameters were used during training:
-- learning_rate: 3e-05
-- train_batch_size: 32
-- eval_batch_size: 32
-- seed: 0
-- gradient_accumulation_steps: 4
-- total_train_batch_size: 128
-- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
-- lr_scheduler_type: linear
-- lr_scheduler_warmup_ratio: 0.1
-- num_epochs: 10.0
-- mixed_precision_training: Native AMP
 ### Training results
-| Training Loss | Epoch | Step | Validation Loss | F1     | Roc Auc | Accuracy | Map    |
-|:-------------:|:-----:|:----:|:---------------:|:------:|:-------:|:--------:|:------:|
-| 1.5521        | 1.0   | 153  | 0.7759          | 0.3929 | 0.6789  | 0.0209   | 0.3394 |
-| 0.7088        | 2.0   | 306  | 0.5183          | 0.4480 | 0.7162  | 0.0349   | 0.4047 |
-| 0.484         | 3.0   | 459  | 0.4342          | 0.4673 | 0.7241  | 0.0447   | 0.4348 |
-| 0.369         | 4.0   | 612  | 0.3847          | 0.4777 | 0.7332  | 0.0504   | 0.4463 |
-| 0.2943        | 5.0   | 765  | 0.3587          | 0.4838 | 0.7284  | 0.0572   | 0.4556 |
-| 0.2446        | 6.0   | 918  | 0.3415          | 0.4875 | 0.7296  | 0.0608   | 0.4628 |
-| 0.2099        | 7.0   | 1071 | 0.3273          | 0.4896 | 0.7246  | 0.0648   | 0.4682 |
-| 0.186         | 8.0   | 1224 | 0.3140          | 0.4888 | 0.7171  | 0.0689   | 0.4711 |
-| 0.1693        | 9.0   | 1377 | 0.3101          | 0.4887 | 0.7157  | 0.0703   | 0.4741 |
-| 0.1582        | 10.0  | 1530 | 0.3063          | 0.4876 | 0.7140  | 0.0714   | 0.4743 |
-### Framework versions
 - Transformers 4.27.0.dev0
 - Pytorch 1.13.1+cu117

 ---
+language: en
+license: apache-2.0
 tags:
+  - audio-classification
+  - generated_from_trainer
 metrics:
+  - accuracy
+  - f1
 ---
+# Distil Audio Spectrogram Transformer AudioSet
+Distil Audio Spectrogram Transformer AudioSet is an audio classification model based on the [Audio Spectrogram Transformer](https://arxiv.org/abs/2104.01778) architecture. This model is a distilled version of [MIT/ast-finetuned-audioset-10-10-0.4593](https://huggingface.co/MIT/ast-finetuned-audioset-10-10-0.4593) on the [AudioSet](https://research.google.com/audioset/download.html) dataset.
+This model was trained using HuggingFace's PyTorch framework. All training was done on a Google Cloud Engine VM with a Tesla A100 GPU. All necessary scripts used for training could be found in the [Files and versions](https://huggingface.co/bookbot/distil-ast-audioset/tree/main) tab, as well as the [Training metrics](https://huggingface.co/bookbot/distil-ast-audioset/tensorboard) logged via Tensorboard.
+## Model
+| Model                 | #params | Arch.                         | Training/Validation data |
+| --------------------- | ------- | ----------------------------- | ------------------------ |
+| `distil-ast-audioset` | 44M     | Audio Spectrogram Transformer | AudioSet                 |
+## Evaluation Results
+The model achieves the following results on evaluation:
+| Model               | F1     | Roc Auc | Accuracy | mAP    |
+| ------------------- | ------ | ------- | -------- | ------ |
+| Distil-AST AudioSet | 0.4876 | 0.7140  | 0.0714   | 0.4743 |
+| AST AudioSet        | 0.4989 | 0.6905  | 0.1247   | 0.5603 |
 ## Training procedure
 ### Training hyperparameters
 The following hyperparameters were used during training:
+- `learning_rate`: 3e-05
+- `train_batch_size`: 32
+- `eval_batch_size`: 32
+- `seed`: 0
+- `gradient_accumulation_steps`: 4
+- `total_train_batch_size`: 128
+- `optimizer`: Adam with `betas=(0.9,0.999)` and `epsilon=1e-08`
+- `lr_scheduler_type`: linear
+- `lr_scheduler_warmup_ratio`: 0.1
+- `num_epochs`: 10.0
+- `mixed_precision_training`: Native AMP
 ### Training results
+| Training Loss | Epoch | Step  | Validation Loss |   F1   | Roc Auc | Accuracy |  Map   |
+| :-----------: | :---: | :---: | :-------------: | :----: | :-----: | :------: | :----: |
+|    1.5521     |  1.0  |  153  |     0.7759      | 0.3929 | 0.6789  |  0.0209  | 0.3394 |
+|    0.7088     |  2.0  |  306  |     0.5183      | 0.4480 | 0.7162  |  0.0349  | 0.4047 |
+|     0.484     |  3.0  |  459  |     0.4342      | 0.4673 | 0.7241  |  0.0447  | 0.4348 |
+|     0.369     |  4.0  |  612  |     0.3847      | 0.4777 | 0.7332  |  0.0504  | 0.4463 |
+|    0.2943     |  5.0  |  765  |     0.3587      | 0.4838 | 0.7284  |  0.0572  | 0.4556 |
+|    0.2446     |  6.0  |  918  |     0.3415      | 0.4875 | 0.7296  |  0.0608  | 0.4628 |
+|    0.2099     |  7.0  | 1071  |     0.3273      | 0.4896 | 0.7246  |  0.0648  | 0.4682 |
+|     0.186     |  8.0  | 1224  |     0.3140      | 0.4888 | 0.7171  |  0.0689  | 0.4711 |
+|    0.1693     |  9.0  | 1377  |     0.3101      | 0.4887 | 0.7157  |  0.0703  | 0.4741 |
+|    0.1582     | 10.0  | 1530  |     0.3063      | 0.4876 | 0.7140  |  0.0714  | 0.4743 |
+## Disclaimer
+Do consider the biases which came from pre-training datasets that may be carried over into the results of this model.
+## Authors
+Distil Audio Spectrogram Transformer AudioSet was trained and evaluated by [Ananto Joyoadikusumo](https://anantoj.github.io), [David Samuel Setiawan](https://davidsamuell.github.io/), [Wilson Wongso](https://wilsonwongso.dev/). All computation and development are done on Google Cloud.
+## Framework versions
 - Transformers 4.27.0.dev0
 - Pytorch 1.13.1+cu117