rromb commited on
Commit
ebcf159
β€’
1 Parent(s): 49117c2

add autoencoder training details, arxiv link and figures

Browse files

Former-commit-id: f8b4a071055f5b25421d0364770267d5fc58d79c

README.md CHANGED
@@ -1,4 +1,23 @@
1
  # Latent Diffusion Models
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
 
3
  ## Requirements
4
  A suitable [conda](https://conda.io/) environment named `ldm` can be created
@@ -31,12 +50,24 @@ conda activate ldm
31
  ### Get the models
32
 
33
  Running the following script downloads und extracts all available pretrained autoencoding models.
34
-
35
  ```shell script
36
  bash scripts/download_first_stages.sh
37
  ```
38
 
39
  The first stage models can then be found in `models/first_stage_models/<model_spec>`
 
 
 
 
 
 
 
 
 
 
 
 
 
40
 
41
  ## Pretrained LDMs
42
  | Datset | Task | Model | FID | IS | Prec | Recall | Link | Comments
@@ -102,4 +133,17 @@ Thanks for open-sourcing!
102
  - The implementation of the transformer encoder is from [x-transformers](https://github.com/lucidrains/x-transformers) by [lucidrains](https://github.com/lucidrains?tab=repositories).
103
 
104
 
 
 
 
 
 
 
 
 
 
 
 
 
 
105
 
 
1
  # Latent Diffusion Models
2
+ [arXiv](https://arxiv.org/abs/2112.10752) | [BibTeX](#bibtex)
3
+
4
+ <p align="center">
5
+ <img src=assets/results.gif />
6
+ </p>
7
+
8
+
9
+
10
+ [**High-Resolution Image Synthesis with Latent Diffusion Models**](https://arxiv.org/abs/2112.10752)<br/>
11
+ [Robin Rombach](https://github.com/rromb)\*,
12
+ [Andreas Blattmann](https://github.com/ablattmann)\*,
13
+ [Dominik Lorenz](https://github.com/qp-qp)\,
14
+ [Patrick Esser](https://github.com/pesser),
15
+ [BjΓΆrn Ommer](https://hci.iwr.uni-heidelberg.de/Staff/bommer)<br/>
16
+ \* equal contribution
17
+
18
+ <p align="center">
19
+ <img src=assets/modelfigure.png />
20
+ </p>
21
 
22
  ## Requirements
23
  A suitable [conda](https://conda.io/) environment named `ldm` can be created
 
50
  ### Get the models
51
 
52
  Running the following script downloads und extracts all available pretrained autoencoding models.
 
53
  ```shell script
54
  bash scripts/download_first_stages.sh
55
  ```
56
 
57
  The first stage models can then be found in `models/first_stage_models/<model_spec>`
58
+ ### Training autoencoder models
59
+
60
+ Configs for training a KL-regularized autoencoder on ImageNet are provided at `configs/autoencoder`.
61
+ Training can be started by running
62
+ ```
63
+ CUDA_VISIBLE_DEVICES=<GPU_ID> python main.py --base configs/autoencoder/<config_spec> -t --gpus 0,
64
+ ```
65
+ where `config_spec` is one of {`autoencoder_kl_8x8x64.yaml`(f=32, d=64), `autoencoder_kl_16x16x16.yaml`(f=16, d=16),
66
+ `autoencoder_kl_32x32x4`(f=8, d=4), `autoencoder_kl_64x64x3`(f=4, d=3)}.
67
+
68
+ For training VQ-regularized models, see the [taming-transformers](https://github.com/CompVis/taming-transformers)
69
+ repository.
70
+
71
 
72
  ## Pretrained LDMs
73
  | Datset | Task | Model | FID | IS | Prec | Recall | Link | Comments
 
133
  - The implementation of the transformer encoder is from [x-transformers](https://github.com/lucidrains/x-transformers) by [lucidrains](https://github.com/lucidrains?tab=repositories).
134
 
135
 
136
+ ## BibTeX
137
+
138
+ ```
139
+ @misc{rombach2021highresolution,
140
+ title={High-Resolution Image Synthesis with Latent Diffusion Models},
141
+ author={Robin Rombach and Andreas Blattmann and Dominik Lorenz and Patrick Esser and BjΓΆrn Ommer},
142
+ year={2021},
143
+ eprint={2112.10752},
144
+ archivePrefix={arXiv},
145
+ primaryClass={cs.CV}
146
+ }
147
+ ```
148
+
149
 
assets/modelfigure.png ADDED
assets/results.gif.REMOVED.git-id ADDED
@@ -0,0 +1 @@
 
 
1
+ 82b6590e670a32196093cc6333ea19e6547d07de