A GPT2M model trained on a larger version of the commaVQ dataset.
This model is able to generate driving video unconditionally.
Below is an example of 5 seconds of imagined video using GPT2M.