pablorodriper
commited on
Commit
•
a0423c1
1
Parent(s):
f95a99f
Update README.md
Browse files
README.md
CHANGED
@@ -1,33 +1,48 @@
|
|
1 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2 |
library_name: keras
|
3 |
---
|
4 |
|
5 |
-
##
|
6 |
|
7 |
-
|
8 |
|
9 |
-
##
|
|
|
10 |
|
11 |
-
|
|
|
12 |
|
13 |
-
## Training
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
14 |
|
15 |
-
|
|
|
|
|
16 |
|
17 |
-
|
|
|
18 |
|
19 |
-
|
20 |
-
|
21 |
-
|
22 |
-
|
23 |
-
| Hyperparameters | Value |
|
24 |
-
| :-- | :-- |
|
25 |
-
| name | Adam |
|
26 |
-
| learning_rate | 9.999999747378752e-05 |
|
27 |
-
| decay | 0.0 |
|
28 |
-
| beta_1 | 0.8999999761581421 |
|
29 |
-
| beta_2 | 0.9990000128746033 |
|
30 |
-
| epsilon | 1e-07 |
|
31 |
-
| amsgrad | False |
|
32 |
-
| training_precision | float32 |
|
33 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
+
title: Video Vision Transformer on medmnist
|
3 |
+
emoji: 🧑⚕️
|
4 |
+
colorFrom: red
|
5 |
+
colorTo: green
|
6 |
+
sdk: gradio
|
7 |
+
app_file: app.py
|
8 |
+
pinned: false
|
9 |
+
license: apache-2.0
|
10 |
library_name: keras
|
11 |
---
|
12 |
|
13 |
+
## Keras Implementation of Video Vision Transformer on medmnist
|
14 |
|
15 |
+
This repo contains the model [to this Keras example on Video Vision Transformer](https://keras.io/examples/vision/vivit/).
|
16 |
|
17 |
+
## Background Information
|
18 |
+
This example implements [ViViT: A Video Vision Transformer](https://arxiv.org/abs/2103.15691) by Arnab et al., a pure Transformer-based model for video classification. The authors propose a novel embedding scheme and a number of Transformer variants to model video clips.
|
19 |
|
20 |
+
## Datasets
|
21 |
+
We use the [MedMNIST v2: A Large-Scale Lightweight Benchmark for 2D and 3D Biomedical Image Classification](https://medmnist.com/) dataset.
|
22 |
|
23 |
+
## Training Parameters
|
24 |
+
```
|
25 |
+
# DATA
|
26 |
+
DATASET_NAME = "organmnist3d"
|
27 |
+
BATCH_SIZE = 32
|
28 |
+
AUTO = tf.data.AUTOTUNE
|
29 |
+
INPUT_SHAPE = (28, 28, 28, 1)
|
30 |
+
NUM_CLASSES = 11
|
31 |
|
32 |
+
# OPTIMIZER
|
33 |
+
LEARNING_RATE = 1e-4
|
34 |
+
WEIGHT_DECAY = 1e-5
|
35 |
|
36 |
+
# TRAINING
|
37 |
+
EPOCHS = 80
|
38 |
|
39 |
+
# TUBELET EMBEDDING
|
40 |
+
PATCH_SIZE = (8, 8, 8)
|
41 |
+
NUM_PATCHES = (INPUT_SHAPE[0] // PATCH_SIZE[0]) ** 2
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
42 |
|
43 |
+
# ViViT ARCHITECTURE
|
44 |
+
LAYER_NORM_EPS = 1e-6
|
45 |
+
PROJECTION_DIM = 128
|
46 |
+
NUM_HEADS = 8
|
47 |
+
NUM_LAYERS = 8
|
48 |
+
```
|