--- library_name: diffusers --- # MGIE This repository contains the UNet and LLaVA model checkpoints from [Guiding Instruction-based Image Editing via Multimodal Large Language Models](https://arxiv.org/abs/2309.17102). For a detailed example of usage, refer to [this notebook](https://github.com/apple/ml-mgie/blob/main/demo.ipynb) and the [official repository](https://github.com/apple/ml-mgie). Additionally, this notebook is a memory-optimized version of the original one. This decouples the MGIE inference pipeline into two broad stages: 1. Calculate all the embeddings in a batched manner with the LLaVA model and the edit head. 2. Pop it off the memory to gain VRAM. 3. Loads the InstructPix2Pix pipeline and performs editing. 💡 MGIE needs additional set up steps that are important to follow before running inference. Please refer to the repository for those instructions. ## Citation ``` @inproceedings{fu2024mgie, author = {Tsu-Jui Fu and Wenze Hu and Xianzhi Du and William Yang Wang and Yinfei Yang, and Zhe Gan},   title = {{Guiding Instruction-based Image Editing via Multimodal Large Language Models}},   booktitle = {International Conference on Learning Representations (ICLR)},   year = {2024} } ```