Monetico: An Efficient Reproduction of Meissonic for Text-to-Image Synthesis
Introduction
Similar to Meissonic, Monetico is a non-autoregressive masked image modeling text-to-image synthesis model capable of generating high-resolution images. It is designed to run efficiently on consumer-grade graphics cards.
Monetico is an efficient reproduction of Meissonic. Trained on 8 H100 GPUs for approximately one week, Monetico can generate high-quality 512x512 images that are comparable to those produced by Meissonic and SDXL.
Monetico was developed by Collov Labs. We extend our gratitude to @MeissonFlow and @viiika for their valuable advice on efficient training.
Usage
For detailed usage instructions, please refer to GitHub repository.
Citation
If you find this work helpful, please consider citing:
@article{bai2024meissonic,
title={Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis},
author={Bai, Jinbin and Ye, Tian and Chow, Wei and Song, Enxin and Chen, Qing-Guo and Li, Xiangtai and Dong, Zhen and Zhu, Lei and Yan, Shuicheng},
journal={arXiv preprint arXiv:2410.08261},
year={2024}
}
- Downloads last month
- 5,347
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.