Spaces:

keras-io
/

conv-lstm

Runtime error

App Files Files Community

nouamanetazi HF staff commited on Feb 13, 2022

Commit

b800b58

1 Parent(s): e06c361

initial commit

Browse files

Files changed (2) hide show

.gitignore +161 -0
conv_lstm.ipynb +430 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,161 @@

+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+# C extensions
+*.so
+# tests and logs
+tests/fixtures/cached_*_text.txt
+logs/
+lightning_logs/
+lang_code_data/
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.nox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+.hypothesis/
+.pytest_cache/
+# Translations
+*.mo
+*.pot
+# Django stuff:
+*.log
+local_settings.py
+db.sqlite3
+# Flask stuff:
+instance/
+.webassets-cache
+# Scrapy stuff:
+.scrapy
+# Sphinx documentation
+docs/_build/
+# PyBuilder
+target/
+# Jupyter Notebook
+.ipynb_checkpoints
+# IPython
+profile_default/
+ipython_config.py
+# pyenv
+.python-version
+# celery beat schedule file
+celerybeat-schedule
+# SageMath parsed files
+*.sage.py
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+# Spyder project settings
+.spyderproject
+.spyproject
+# Rope project settings
+.ropeproject
+# mkdocs documentation
+/site
+# mypy
+.mypy_cache/
+.dmypy.json
+dmypy.json
+# Pyre type checker
+.pyre/
+# vscode
+.vs
+.vscode
+# Pycharm
+.idea
+# TF code
+tensorflow_code
+# Models
+proc_data
+# examples
+runs
+/runs_old
+/wandb
+/examples/runs
+/examples/**/*.args
+/examples/rag/sweep
+# data
+/data
+serialization_dir
+# emacs
+*.*~
+debug.env
+# vim
+.*.swp
+#ctags
+tags
+# pre-commit
+.pre-commit*
+# .lock
+*.lock

conv_lstm.ipynb ADDED Viewed

	@@ -0,0 +1,430 @@

+{
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "5iJqHKEQx66F"
+      },
+      "source": [
+        "# Next-Frame Video Prediction with Convolutional LSTMs\n",
+        "\n",
+        "**Author:** [Amogh Joshi](https://github.com/amogh7joshi)<br>\n",
+        "**Date created:** 2021/06/02<br>\n",
+        "**Last modified:** 2021/06/05<br>\n",
+        "**Description:** How to build and train a convolutional LSTM model for next-frame video prediction."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "9vv8zp4vx66K"
+      },
+      "source": [
+        "## Introduction\n",
+        "\n",
+        "The\n",
+        "[Convolutional LSTM](https://papers.nips.cc/paper/2015/file/07563a3fe3bbe7e3ba84431ad9d055af-Paper.pdf)\n",
+        "architectures bring together time series processing and computer vision by\n",
+        "introducing a convolutional recurrent cell in a LSTM layer. In this example, we will explore the\n",
+        "Convolutional LSTM model in an application to next-frame prediction, the process\n",
+        "of predicting what video frames come next given a series of past frames."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "daG-n305x66K"
+      },
+      "source": [
+        "## Setup"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "4Xx9qttUx66L"
+      },
+      "outputs": [],
+      "source": [
+        "import numpy as np\n",
+        "import matplotlib.pyplot as plt\n",
+        "\n",
+        "import tensorflow as tf\n",
+        "from tensorflow import keras\n",
+        "from tensorflow.keras import layers\n",
+        "\n",
+        "import io\n",
+        "import imageio\n",
+        "from IPython.display import Image, display\n",
+        "from ipywidgets import widgets, Layout, HBox"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "w-uOOdg1x66M"
+      },
+      "source": [
+        "## Dataset Construction\n",
+        "\n",
+        "For this example, we will be using the\n",
+        "[Moving MNIST](http://www.cs.toronto.edu/~nitish/unsupervised_video/)\n",
+        "dataset.\n",
+        "\n",
+        "We will download the dataset and then construct and\n",
+        "preprocess training and validation sets.\n",
+        "\n",
+        "For next-frame prediction, our model will be using a previous frame,\n",
+        "which we'll call `f_n`, to predict a new frame, called `f_(n + 1)`.\n",
+        "To allow the model to create these predictions, we'll need to process\n",
+        "the data such that we have \"shifted\" inputs and outputs, where the\n",
+        "input data is frame `x_n`, being used to predict frame `y_(n + 1)`."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "H6_vt6q4x66N"
+      },
+      "outputs": [],
+      "source": [
+        "# Download and load the dataset.\n",
+        "fpath = keras.utils.get_file(\n",
+        "    \"moving_mnist.npy\",\n",
+        "    \"http://www.cs.toronto.edu/~nitish/unsupervised_video/mnist_test_seq.npy\",\n",
+        ")\n",
+        "dataset = np.load(fpath)\n",
+        "\n",
+        "# Swap the axes representing the number of frames and number of data samples.\n",
+        "dataset = np.swapaxes(dataset, 0, 1)\n",
+        "# We'll pick out 1000 of the 10000 total examples and use those.\n",
+        "dataset = dataset[:1000, ...]\n",
+        "# Add a channel dimension since the images are grayscale.\n",
+        "dataset = np.expand_dims(dataset, axis=-1)\n",
+        "\n",
+        "# Split into train and validation sets using indexing to optimize memory.\n",
+        "indexes = np.arange(dataset.shape[0])\n",
+        "np.random.shuffle(indexes)\n",
+        "train_index = indexes[: int(0.9 * dataset.shape[0])]\n",
+        "val_index = indexes[int(0.9 * dataset.shape[0]) :]\n",
+        "train_dataset = dataset[train_index]\n",
+        "val_dataset = dataset[val_index]\n",
+        "\n",
+        "# Normalize the data to the 0-1 range.\n",
+        "train_dataset = train_dataset / 255\n",
+        "val_dataset = val_dataset / 255\n",
+        "\n",
+        "# We'll define a helper function to shift the frames, where\n",
+        "# `x` is frames 0 to n - 1, and `y` is frames 1 to n.\n",
+        "def create_shifted_frames(data):\n",
+        "    x = data[:, 0 : data.shape[1] - 1, :, :]\n",
+        "    y = data[:, 1 : data.shape[1], :, :]\n",
+        "    return x, y\n",
+        "\n",
+        "\n",
+        "# Apply the processing function to the datasets.\n",
+        "x_train, y_train = create_shifted_frames(train_dataset)\n",
+        "x_val, y_val = create_shifted_frames(val_dataset)\n",
+        "\n",
+        "# Inspect the dataset.\n",
+        "print(\"Training Dataset Shapes: \" + str(x_train.shape) + \", \" + str(y_train.shape))\n",
+        "print(\"Validation Dataset Shapes: \" + str(x_val.shape) + \", \" + str(y_val.shape))"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "wJhm7oM7x66O"
+      },
+      "source": [
+        "## Data Visualization\n",
+        "\n",
+        "Our data consists of sequences of frames, each of which\n",
+        "are used to predict the upcoming frame. Let's take a look\n",
+        "at some of these sequential frames."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "jFE2fY1xx66O"
+      },
+      "outputs": [],
+      "source": [
+        "# Construct a figure on which we will visualize the images.\n",
+        "fig, axes = plt.subplots(4, 5, figsize=(10, 8))\n",
+        "\n",
+        "# Plot each of the sequential images for one random data example.\n",
+        "data_choice = np.random.choice(range(len(train_dataset)), size=1)[0]\n",
+        "for idx, ax in enumerate(axes.flat):\n",
+        "    ax.imshow(np.squeeze(train_dataset[data_choice][idx]), cmap=\"gray\")\n",
+        "    ax.set_title(f\"Frame {idx + 1}\")\n",
+        "    ax.axis(\"off\")\n",
+        "\n",
+        "# Print information and display the figure.\n",
+        "print(f\"Displaying frames for example {data_choice}.\")\n",
+        "plt.show()"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "jPQQIUm6x66P"
+      },
+      "source": [
+        "## Model Construction\n",
+        "\n",
+        "To build a Convolutional LSTM model, we will use the\n",
+        "`ConvLSTM2D` layer, which will accept inputs of shape\n",
+        "`(batch_size, num_frames, width, height, channels)`, and return\n",
+        "a prediction movie of the same shape."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "D3OvRaVpx66P"
+      },
+      "outputs": [],
+      "source": [
+        "# Construct the input layer with no definite frame size.\n",
+        "inp = layers.Input(shape=(None, *x_train.shape[2:]))\n",
+        "\n",
+        "# We will construct 3 `ConvLSTM2D` layers with batch normalization,\n",
+        "# followed by a `Conv3D` layer for the spatiotemporal outputs.\n",
+        "x = layers.ConvLSTM2D(\n",
+        "    filters=64,\n",
+        "    kernel_size=(5, 5),\n",
+        "    padding=\"same\",\n",
+        "    return_sequences=True,\n",
+        "    activation=\"relu\",\n",
+        ")(inp)\n",
+        "x = layers.BatchNormalization()(x)\n",
+        "x = layers.ConvLSTM2D(\n",
+        "    filters=64,\n",
+        "    kernel_size=(3, 3),\n",
+        "    padding=\"same\",\n",
+        "    return_sequences=True,\n",
+        "    activation=\"relu\",\n",
+        ")(x)\n",
+        "x = layers.BatchNormalization()(x)\n",
+        "x = layers.ConvLSTM2D(\n",
+        "    filters=64,\n",
+        "    kernel_size=(1, 1),\n",
+        "    padding=\"same\",\n",
+        "    return_sequences=True,\n",
+        "    activation=\"relu\",\n",
+        ")(x)\n",
+        "x = layers.Conv3D(\n",
+        "    filters=1, kernel_size=(3, 3, 3), activation=\"sigmoid\", padding=\"same\"\n",
+        ")(x)\n",
+        "\n",
+        "# Next, we will build the complete model and compile it.\n",
+        "model = keras.models.Model(inp, x)\n",
+        "model.compile(\n",
+        "    loss=keras.losses.binary_crossentropy, optimizer=keras.optimizers.Adam(),\n",
+        ")"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "Nd0VLhrvx66Q"
+      },
+      "source": [
+        "## Model Training\n",
+        "\n",
+        "With our model and data constructed, we can now train the model."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "v9U57leux66Q"
+      },
+      "outputs": [],
+      "source": [
+        "# Define some callbacks to improve training.\n",
+        "early_stopping = keras.callbacks.EarlyStopping(monitor=\"val_loss\", patience=10)\n",
+        "reduce_lr = keras.callbacks.ReduceLROnPlateau(monitor=\"val_loss\", patience=5)\n",
+        "\n",
+        "# Define modifiable training hyperparameters.\n",
+        "epochs = 20\n",
+        "batch_size = 5\n",
+        "\n",
+        "# Fit the model to the training data.\n",
+        "model.fit(\n",
+        "    x_train,\n",
+        "    y_train,\n",
+        "    batch_size=batch_size,\n",
+        "    epochs=epochs,\n",
+        "    validation_data=(x_val, y_val),\n",
+        "    callbacks=[early_stopping, reduce_lr],\n",
+        ")"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "RxB7zZIxx66R"
+      },
+      "source": [
+        "## Frame Prediction Visualizations\n",
+        "\n",
+        "With our model now constructed and trained, we can generate\n",
+        "some example frame predictions based on a new video.\n",
+        "\n",
+        "We'll pick a random example from the validation set and\n",
+        "then choose the first ten frames from them. From there, we can\n",
+        "allow the model to predict 10 new frames, which we can compare\n",
+        "to the ground truth frame predictions."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "qsujRd4Ex66R"
+      },
+      "outputs": [],
+      "source": [
+        "# Select a random example from the validation dataset.\n",
+        "example = val_dataset[np.random.choice(range(len(val_dataset)), size=1)[0]]\n",
+        "\n",
+        "# Pick the first/last ten frames from the example.\n",
+        "frames = example[:10, ...]\n",
+        "original_frames = example[10:, ...]\n",
+        "\n",
+        "# Predict a new set of 10 frames.\n",
+        "for _ in range(10):\n",
+        "    # Extract the model's prediction and post-process it.\n",
+        "    new_prediction = model.predict(np.expand_dims(frames, axis=0))\n",
+        "    new_prediction = np.squeeze(new_prediction, axis=0)\n",
+        "    predicted_frame = np.expand_dims(new_prediction[-1, ...], axis=0)\n",
+        "\n",
+        "    # Extend the set of prediction frames.\n",
+        "    frames = np.concatenate((frames, predicted_frame), axis=0)\n",
+        "\n",
+        "# Construct a figure for the original and new frames.\n",
+        "fig, axes = plt.subplots(2, 10, figsize=(20, 4))\n",
+        "\n",
+        "# Plot the original frames.\n",
+        "for idx, ax in enumerate(axes[0]):\n",
+        "    ax.imshow(np.squeeze(original_frames[idx]), cmap=\"gray\")\n",
+        "    ax.set_title(f\"Frame {idx + 11}\")\n",
+        "    ax.axis(\"off\")\n",
+        "\n",
+        "# Plot the new frames.\n",
+        "new_frames = frames[10:, ...]\n",
+        "for idx, ax in enumerate(axes[1]):\n",
+        "    ax.imshow(np.squeeze(new_frames[idx]), cmap=\"gray\")\n",
+        "    ax.set_title(f\"Frame {idx + 11}\")\n",
+        "    ax.axis(\"off\")\n",
+        "\n",
+        "# Display the figure.\n",
+        "plt.show()"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "78OrJXZfx66R"
+      },
+      "source": [
+        "## Predicted Videos\n",
+        "\n",
+        "Finally, we'll pick a few examples from the validation set\n",
+        "and construct some GIFs with them to see the model's\n",
+        "predicted videos."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "ncMx34rLx66R"
+      },
+      "outputs": [],
+      "source": [
+        "# Select a few random examples from the dataset.\n",
+        "examples = val_dataset[np.random.choice(range(len(val_dataset)), size=5)]\n",
+        "\n",
+        "# Iterate over the examples and predict the frames.\n",
+        "predicted_videos = []\n",
+        "for example in examples:\n",
+        "    # Pick the first/last ten frames from the example.\n",
+        "    frames = example[:10, ...]\n",
+        "    original_frames = example[10:, ...]\n",
+        "    new_predictions = np.zeros(shape=(10, *frames[0].shape))\n",
+        "\n",
+        "    # Predict a new set of 10 frames.\n",
+        "    for i in range(10):\n",
+        "        # Extract the model's prediction and post-process it.\n",
+        "        frames = example[: 10 + i + 1, ...]\n",
+        "        new_prediction = model.predict(np.expand_dims(frames, axis=0))\n",
+        "        new_prediction = np.squeeze(new_prediction, axis=0)\n",
+        "        predicted_frame = np.expand_dims(new_prediction[-1, ...], axis=0)\n",
+        "\n",
+        "        # Extend the set of prediction frames.\n",
+        "        new_predictions[i] = predicted_frame\n",
+        "\n",
+        "    # Create and save GIFs for each of the ground truth/prediction images.\n",
+        "    for frame_set in [original_frames, new_predictions]:\n",
+        "        # Construct a GIF from the selected video frames.\n",
+        "        current_frames = np.squeeze(frame_set)\n",
+        "        current_frames = current_frames[..., np.newaxis] * np.ones(3)\n",
+        "        current_frames = (current_frames * 255).astype(np.uint8)\n",
+        "        current_frames = list(current_frames)\n",
+        "\n",
+        "        # Construct a GIF from the frames.\n",
+        "        with io.BytesIO() as gif:\n",
+        "            imageio.mimsave(gif, current_frames, \"GIF\", fps=5)\n",
+        "            predicted_videos.append(gif.getvalue())\n",
+        "\n",
+        "# Display the videos.\n",
+        "print(\" Truth\\tPrediction\")\n",
+        "for i in range(0, len(predicted_videos), 2):\n",
+        "    # Construct and display an `HBox` with the ground truth and prediction.\n",
+        "    box = HBox(\n",
+        "        [\n",
+        "            widgets.Image(value=predicted_videos[i]),\n",
+        "            widgets.Image(value=predicted_videos[i + 1]),\n",
+        "        ]\n",
+        "    )\n",
+        "    display(box)"
+      ]
+    }
+  ],
+  "metadata": {
+    "colab": {
+      "collapsed_sections": [],
+      "name": "conv_lstm",
+      "provenance": [],
+      "toc_visible": true
+    },
+    "kernelspec": {
+      "display_name": "Python 3",
+      "language": "python",
+      "name": "python3"
+    },
+    "language_info": {
+      "codemirror_mode": {
+        "name": "ipython",
+        "version": 3
+      },
+      "file_extension": ".py",
+      "mimetype": "text/x-python",
+      "name": "python",
+      "nbconvert_exporter": "python",
+      "pygments_lexer": "ipython3",
+      "version": "3.7.0"
+    }
+  },
+  "nbformat": 4,
+  "nbformat_minor": 0
+}