File size: 2,982 Bytes
f8a1630
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5c44520
 
0e1e9bb
 
5c44520
 
0e1e9bb
 
 
 
 
f8a1630
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
337fd6b
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
---
license: openrail++
language:
- en
library_name: diffusers
tags:
- text-to-image
- prior
- unclip
- kandinskyv2.2
---


# Introduction

This ECLIPSE model weight is a tiny (33M parameter) non-diffusion text-to-image prior model trained on 5M LAION-HighRes subset data.

Despite being so small and trained on a limited amount of data, ECLIPSE priors achieve results that of 1 Billion parameter T2I prior models trained on millions of image-text pairs.

- **Project Page:** [https://eclipse-t2i.vercel.app](https://eclipse-t2i.vercel.app)
- **GitHub:** [https://github.com/eclipse-t2i/eclipse-inference](https://github.com/eclipse-t2i/eclipse-inference)


## Evaluations

![Qualitative Examples](./assets/example.png)

![Results](./assets/results.png)

## Installation
```bash
git clone git@github.com:eclipse-t2i/eclipse-inference.git

conda create -p ./venv python=3.9
pip install -r requirements.txt
```

## Run Inference

This repository supports two pre-trained image decoders: [Karlo-v1-alpha](https://huggingface.co/kakaobrain/karlo-v1-alpha) and [Kandinsky-v2.2](https://huggingface.co/kandinsky-community/kandinsky-2-2-decoder).
Note: ECLIPSE prior is not a diffusion model -- while image decoders are.

### Karlo Inference
```python
from src.pipelines.pipeline_unclip import UnCLIPPipeline
from src.priors.prior_transformer import PriorTransformer

prior = PriorTransformer.from_pretrained("ECLIPSE-Community/ECLIPSE_Karlo_Prior")
pipe = UnCLIPPipeline.from_pretrained("kakaobrain/karlo-v1-alpha", prior=prior).to("cuda")

prompt="black apples in the basket"
images = pipe(prompt, decoder_guidance_scale=7.5).images

images[0]
```

### Kandinsky Inference
```python
from src.pipelines.pipeline_kandinsky_prior import KandinskyPriorPipeline
from src.priors.prior_transformer import PriorTransformer
from diffusers import DiffusionPipeline

prior = PriorTransformer.from_pretrained("ECLIPSE-Community/ECLIPSE_KandinskyV22_Prior")
pipe_prior = KandinskyPriorPipeline.from_pretrained("kandinsky-community/kandinsky-2-2-prior", prior=prior).to("cuda")

pipe = DiffusionPipeline.from_pretrained("kandinsky-community/kandinsky-2-2-decoder").to("cuda")

prompt = "black apples in the basket"
image_embeds, negative_image_embeds = pipe_prior(prompt).to_tuple()
images = pipe(
    num_inference_steps=50,
    image_embeds=image_embeds,
    negative_image_embeds=negative_image_embeds,
).images

images[0]
```


## Limitations

The model is intended for research purposes only to show a way to reduce the unnecessary resource usage in existing T2I research.

As this prior model is trained using very small LAION subset and CLIP supervision, it will observe the limitations from the CLIP model such as: 
* Lack of spatial understanding.
* Cannot render legible text
* Complex compositionality is still a big challenge that can be improved if CLIP is improved.
* While the capabilities of image generation models are impressive, they can also reinforce or exacerbate social biases.