trf-sg2im

Model card for the paper "Transformer-Based Image Generation from Scene Graphs". Original GitHub implementation here.

teaser

Model

This model is a two-stage scene-graph-to-image approach. It takes a scene graph as input and generates a layout using a transformer-based architecture with Laplacian Positional Encoding. Then, it uses this estimated layout to condition an autoregressive GPT-like transformer to compose the image in the latent, discrete space, converted into the final image by a VQVAE.

architecture

Usage

For usage instructions, please refer to the original GitHub repo.

Results

Comparison with other state-of-the-art approaches

results

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The HF Inference API does not support text-to-image models for transformers library.

Datasets used to train rsortino/trf-sg2im