I can't make it work in Google Colab
#1
by
QES
- opened
README.md
CHANGED
@@ -17,8 +17,6 @@ Our generative model has `Next-DiT` as the backbone, the text encoder is the `Ge
|
|
17 |
|
18 |
[paper](https://arxiv.org/abs/2405.05945)
|
19 |
|
20 |
-
![hero](https://github.com/Alpha-VLLM/Lumina-T2X/assets/54879512/9f52eabb-07dc-4881-8257-6d8a5f2a0a5a)
|
21 |
-
|
22 |
## π° News
|
23 |
|
24 |
- [2024-06-08] πππ We have released the `Lumina-Next-SFT` model.
|
@@ -134,7 +132,7 @@ pip install -e .
|
|
134 |
ββ (Recommended) you can use huggingface_cli to download our model:
|
135 |
|
136 |
```bash
|
137 |
-
huggingface-cli download --resume-download Alpha-VLLM/Lumina-Next-
|
138 |
```
|
139 |
|
140 |
or using git for cloning the model you want to use:
|
@@ -153,9 +151,9 @@ Update your own personal inference settings to generate different styles of imag
|
|
153 |
- settings:
|
154 |
|
155 |
model:
|
156 |
-
ckpt: ""
|
157 |
-
ckpt_lm: ""
|
158 |
-
token: ""
|
159 |
|
160 |
transport:
|
161 |
path_type: "Linear" # option: ["Linear", "GVP", "VP"]
|
@@ -171,17 +169,41 @@ Update your own personal inference settings to generate different styles of imag
|
|
171 |
likelihood: false # option: true or false
|
172 |
|
173 |
infer:
|
174 |
-
resolution: "1024x1024"
|
175 |
-
num_sampling_steps: 60
|
176 |
-
cfg_scale: 4.
|
177 |
-
solver: "euler"
|
178 |
-
t_shift: 4
|
179 |
-
|
180 |
-
|
181 |
-
|
182 |
-
seed: 0 # rnage: any number
|
183 |
```
|
184 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
185 |
1. Run with CLI
|
186 |
|
187 |
inference command:
|
|
|
17 |
|
18 |
[paper](https://arxiv.org/abs/2405.05945)
|
19 |
|
|
|
|
|
20 |
## π° News
|
21 |
|
22 |
- [2024-06-08] πππ We have released the `Lumina-Next-SFT` model.
|
|
|
132 |
ββ (Recommended) you can use huggingface_cli to download our model:
|
133 |
|
134 |
```bash
|
135 |
+
huggingface-cli download --resume-download Alpha-VLLM/Lumina-Next-T2I --local-dir /path/to/ckpt
|
136 |
```
|
137 |
|
138 |
or using git for cloning the model you want to use:
|
|
|
151 |
- settings:
|
152 |
|
153 |
model:
|
154 |
+
ckpt: "/path/to/ckpt" # if ckpt is "", you should use `--ckpt` for passing model path when using `lumina` cli.
|
155 |
+
ckpt_lm: "" # if ckpt is "", you should use `--ckpt_lm` for passing model path when using `lumina` cli.
|
156 |
+
token: "" # if LLM is a huggingface gated repo, you should input your access token from huggingface and when token is "", you should `--token` for accessing the model.
|
157 |
|
158 |
transport:
|
159 |
path_type: "Linear" # option: ["Linear", "GVP", "VP"]
|
|
|
169 |
likelihood: false # option: true or false
|
170 |
|
171 |
infer:
|
172 |
+
resolution: "1024x1024" # option: ["1024x1024", "512x2048", "2048x512", "(Extrapolation) 1664x1664", "(Extrapolation) 1024x2048", "(Extrapolation) 2048x1024"]
|
173 |
+
num_sampling_steps: 60 # range: 1-1000
|
174 |
+
cfg_scale: 4. # range: 1-20
|
175 |
+
solver: "euler" # option: ["euler", "dopri5", "dopri8"]
|
176 |
+
t_shift: 4 # range: 1-20 (int only)
|
177 |
+
ntk_scaling: true # option: true or false
|
178 |
+
proportional_attn: true # option: true or false
|
179 |
+
seed: 0 # rnage: any number
|
|
|
180 |
```
|
181 |
|
182 |
+
- model:
|
183 |
+
- `ckpt`: lumina-next-t2i checkpoint path from [huggingface repo](https://huggingface.co/Alpha-VLLM/Lumina-Next-T2I) containing `consolidated*.pth` and `model_args.pth`.
|
184 |
+
- `ckpt_lm`: LLM checkpoint.
|
185 |
+
- `token`: huggingface access token for accessing gated repo.
|
186 |
+
- transport:
|
187 |
+
- `path_type`: the type of path for transport: 'Linear', 'GVP' (Geodesic Vector Pursuit), or 'VP' (Vector Pursuit).
|
188 |
+
- `prediction`: the prediction model for the transport dynamics.
|
189 |
+
- `loss_weight`: the weighting of different components in the loss function, can be 'velocity' for dynamic modeling, 'likelihood' for statistical consistency, or None for no weighting
|
190 |
+
- `sample_eps`: sampling in the transport model.
|
191 |
+
- `train_eps`: training to stabilize the learning process.
|
192 |
+
- ode:
|
193 |
+
- `atol`: Absolute tolerance for the ODE solver. (options: ["Linear", "GVP", "VP"])
|
194 |
+
- `rtol`: Relative tolerance for the ODE solver. (option: ["velocity", "score", "noise"])
|
195 |
+
- `reverse`: run the ODE solver in reverse. (option: [None, "velocity", "likelihood"])
|
196 |
+
- `likelihood`: Enable calculation of likelihood during the ODE solving process.
|
197 |
+
- infer
|
198 |
+
- `resolution`: generated image resolution.
|
199 |
+
- `num_sampling_steps`: sampling step for generating image.
|
200 |
+
- `cfg_scale`: classifier-free guide scaling factor
|
201 |
+
- `solver`: solver for image generation.
|
202 |
+
- `t_shift`: time shift factor.
|
203 |
+
- `ntk_scaling`: ntk rope scaling factor.
|
204 |
+
- `proportional_attn`: Whether to use proportional attention.
|
205 |
+
- `seed`: random initialization seeds.
|
206 |
+
|
207 |
1. Run with CLI
|
208 |
|
209 |
inference command:
|