arxiv:2405.13473

Class-Conditional self-reward mechanism for improved Text-to-Image models

Published on May 22, 2024

Authors:

Abstract

A self-rewarding mechanism for Text-to-Image models improves image quality and efficiency by fine-tuning diffusion models on self-generated, self-judged datasets.

AI-generated summary

Self-rewarding have emerged recently as a powerful tool in the field of Natural Language Processing (NLP), allowing language models to generate high-quality relevant responses by providing their own rewards during training. This innovative technique addresses the limitations of other methods that rely on human preferences. In this paper, we build upon the concept of self-rewarding models and introduce its vision equivalent for Text-to-Image generative AI models. This approach works by fine-tuning diffusion model on a self-generated self-judged dataset, making the fine-tuning more automated and with better data quality. The proposed mechanism makes use of other pre-trained models such as vocabulary based-object detection, image captioning and is conditioned by the a set of object for which the user might need to improve generated data quality. The approach has been implemented, fine-tuned and evaluated on stable diffusion and has led to a performance that has been evaluated to be at least 60\% better than existing commercial and research Text-to-image models. Additionally, the built self-rewarding mechanism allowed a fully automated generation of images, while increasing the visual quality of the generated images and also more efficient following of prompt instructions. The code used in this work is freely available on https://github.com/safouaneelg/SRT2I.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2405.13473 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2405.13473 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2405.13473 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.