pablorodriper
/

video-vision-transformer

Model card Files Files and versions Community

pablorodriper commited on Oct 26, 2022

Commit

a0423c1

•

1 Parent(s): f95a99f

Update README.md

Files changed (1) hide show

README.md +36 -21

README.md CHANGED Viewed

@@ -1,33 +1,48 @@
 ---
 library_name: keras
 ---
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
-### Training hyperparameters
-The following hyperparameters were used during training:
-| Hyperparameters | Value |
-| :-- | :-- |
-| name | Adam |
-| learning_rate | 9.999999747378752e-05 |
-| decay | 0.0 |
-| beta_1 | 0.8999999761581421 |
-| beta_2 | 0.9990000128746033 |
-| epsilon | 1e-07 |
-| amsgrad | False |
-| training_precision | float32 |

 ---
+title: Video Vision Transformer on medmnist
+emoji: 🧑‍⚕️
+colorFrom: red
+colorTo: green
+sdk: gradio
+app_file: app.py
+pinned: false
+license: apache-2.0
 library_name: keras
 ---
+## Keras Implementation of Video Vision Transformer on medmnist
+This repo contains the model [to this Keras example on Video Vision Transformer](https://keras.io/examples/vision/vivit/).
+## Background Information
+This example implements [ViViT: A Video Vision Transformer](https://arxiv.org/abs/2103.15691) by Arnab et al., a pure Transformer-based model for video classification. The authors propose a novel embedding scheme and a number of Transformer variants to model video clips.
+## Datasets
+We use the [MedMNIST v2: A Large-Scale Lightweight Benchmark for 2D and 3D Biomedical Image Classification](https://medmnist.com/) dataset.
+## Training Parameters
+```
+# DATA
+DATASET_NAME = "organmnist3d"
+BATCH_SIZE = 32
+AUTO = tf.data.AUTOTUNE
+INPUT_SHAPE = (28, 28, 28, 1)
+NUM_CLASSES = 11
+# OPTIMIZER
+LEARNING_RATE = 1e-4
+WEIGHT_DECAY = 1e-5
+# TRAINING
+EPOCHS = 80
+# TUBELET EMBEDDING
+PATCH_SIZE = (8, 8, 8)
+NUM_PATCHES = (INPUT_SHAPE[0] // PATCH_SIZE[0]) ** 2
+# ViViT ARCHITECTURE
+LAYER_NORM_EPS = 1e-6
+PROJECTION_DIM = 128
+NUM_HEADS = 8
+NUM_LAYERS = 8
+```