@vladbogo on Hugging Face: "Genie is a new method from Google DeepMind that generates interactive…"

Hugging Face

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Back to feed

vladbogo

posted an update Feb 27

Post

Genie is a new method from Google DeepMind that generates interactive, action-controllable virtual worlds from unlabelled internet videos using.

Keypoints:
* Genie leverages a spatiotemporal video tokenizer, an autoregressive dynamics model, and a latent action model to generate controllable video environments.
* The model is trained on video data alone, without requiring action labels, using unsupervised learning to infer latent actions between frames.
* The method restricts the size of the action vocabulary to 8 to ensure that the number of possible latent actions remains small.
* The dataset used for training is generated by filtering publicly available internet videos with specific criteria related to 2D platformer games for a total of 6.8M videos used for training.

Paper: Genie: Generative Interactive Environments (2402.15391)
Project page: https://sites.google.com/view/genie-2024/
More detailed overview in my blog: https://huggingface.co/blog/vladbogo/genie-generative-interactive-environments

Congrats to the authors for their work!

merve

Feb 27

Thanks a lot for the blog post, it's very informative 🤗

osanseviero

Feb 27

text to game is super exciting!

vladbogo

Feb 27

totally agree. Can't wait to see what comes next

In this post