amankishore commited on
Commit
c255c40
1 Parent(s): 5426b53

Added subpixel rendering!

Browse files
Files changed (5) hide show
  1. README-orig.md +27 -22
  2. README.md +9 -1
  3. app.py +7 -3
  4. highres_final_vis.py +124 -0
  5. voxnerf/vox.py +0 -3
README-orig.md CHANGED
@@ -9,26 +9,35 @@
9
 
10
  TTI-Chicago, †Purdue University
11
 
12
- The repository contains Pytorch implementation of Score Jacobian Chaining: Lifting Pretrained 2D Diffusion Models for 3D Generation.
13
 
14
- > We introduce a method that converts a pretrained 2D diffusion generative model on images into a 3D generative model of radiance fields, without requiring access to any 3D data. The key insight is to interpret diffusion models as learned predictors of a gradient field, often referred to as the score function of the data log-likelihood. We apply the chain rule on the estimated score, hence the name Score Jacobian Chaining (SJC).
15
 
16
  <a href="https://arxiv.org/abs/2212.00774"><img src="https://img.shields.io/badge/arXiv-2212.00774-b31b1b.svg" height=22.5></a>
17
- <a href="https://colab.research.google.com/drive/1zixo66UYGl70VOPy053o7IV_YkQt5lCZ?usp=sharing"><img src="https://colab.research.google.com/assets/colab-badge.svg" height=22.5></a>
18
- <a href="https://pals.ttic.edu/p/score-jacobian-chaining"><img src="https://img.shields.io/website?down_color=lightgrey&down_message=offline&label=Project%20Page&up_color=lightgreen&up_message=online&url=https%3A%2F%2Fpals.ttic.edu%2Fp%2Fscore-jacobian-chaining" height=22.5></a>
19
 
20
  <!-- [ [arxiv](https://arxiv.org/abs/2212.00774) | [project page](https://pals.ttic.edu/p/score-jacobian-chaining) | [colab](https://colab.research.google.com/drive/1zixo66UYGl70VOPy053o7IV_YkQt5lCZ?usp=sharing ) ] -->
21
 
22
  Many thanks to [dvschultz](https://github.com/dvschultz) for the colab.
23
 
 
 
 
 
 
 
 
 
 
 
24
  ## License
25
- Since we use Stable Diffusion, we are releasing under their OpenRAIL license. Otherwise we do not
26
- identify any components or upstream code that carry restrictive licensing requirements.
27
 
28
- ## Structure
29
- In addition to SJC, the repo also contains an implementation of [Karras sampler](https://arxiv.org/abs/2206.00364),
30
- and a customized, simple voxel nerf. We provide the abstract parent class based on Karras et. al. and include
31
- a few types of diffusion model here. See adapt.py.
32
 
33
  ## Installation
34
 
@@ -46,8 +55,8 @@ git clone --depth 1 git@github.com:CompVis/taming-transformers.git && pip instal
46
 
47
  ## Downloading checkpoints
48
  We have bundled a minimal set of things you need to download (SD v1.5 ckpt, gddpm ckpt for LSUN and FFHQ)
49
- in a tar file, made available at our download server [here](https://dl.ttic.edu/pals/sjc/release.tar).
50
- It is a single file of 12GB, and you can use wget or curl.
51
 
52
  Remember to __update__ `env.json` to point at the new checkpoint root where you have uncompressed the files.
53
 
@@ -57,7 +66,7 @@ Make a new directory to run experiments (the script generates many logging files
57
  mkdir exp
58
  cd exp
59
  ```
60
- Run the following command to generate a new 3D asset. It takes about 25 minutes on a single A5000 GPU for 10000 steps of optimization.
61
  ```bash
62
  python /path/to/sjc/run_sjc.py \
63
  --sd.prompt "A zoomed out high quality photo of Temple of Heaven" \
@@ -86,15 +95,11 @@ python /path/to/sjc/run_sjc.py \
86
 
87
  `depth_weight` the weighting factor of the center depth loss
88
 
89
- `var_red` whether to use Eq. 16 vs Eq. 15. For some prompts such as Obama we actually see better results with Eq. 15.
90
 
91
  Visualization results are stored in the current directory. In directories named `test_*` there are images (under `view`) and videos (under `view_seq`) rendered at different iterations.
92
 
93
 
94
- ## TODOs
95
- - [ ] add sub-pixel rendering script for high quality visualization such as in the teaser.
96
- - [ ] add script to reproduce 2D experiments in Fig 4. The Fig might need change once it's tied to seeds. Note that for a simple aligned domain like faces, simple scheduling like using a single σ=1.5 could already generate some nice images. But not so for bedrooms; it's too diverse and annealing seems still needed.
97
-
98
  ## To Reproduce the Results in the Paper
99
  First create a clean directory for your experiment, then run one of the following scripts from that folder:
100
  ### Trump
@@ -200,19 +205,19 @@ python /path/to/sjc/run_sjc.py --sd.prompt "A pig" --n_steps 10000 --lr 0.05 --s
200
  ```
201
  python /path/to/sjc/run_nerf.py
202
  ```
203
- Our bundle contains a tar ball for the lego bulldozer dataset. Untar it and it will work.
204
 
205
  ## To Sample 2D images with the Karras Sampler
206
  ```
207
  python /path/to/sjc/run_img_sampling.py
208
  ```
209
- Use help -h to see the options available. Will expand the details later.
210
 
211
 
212
- ## Bib
213
  ```
214
  @article{sjc,
215
- title={Score Jacobian Chaining: Lifting Pretrained 2D Diffusion Models for 3D Generation},
216
  author={Wang, Haochen and Du, Xiaodan and Li, Jiahao and Yeh, Raymond A. and Shakhnarovich, Greg},
217
  journal={arXiv preprint arXiv:2212.00774},
218
  year={2022},
 
9
 
10
  TTI-Chicago, &dagger;Purdue University
11
 
12
+ Abstract: *A diffusion model learns to predict a vector field of gradients. We propose to apply chain rule on the learned gradients, and back-propagate the score of a diffusion model through the Jacobian of a differentiable renderer, which we instantiate to be a voxel radiance field. This setup aggregates 2D scores at multiple camera viewpoints into a 3D score, and repurposes a pretrained 2D model for 3D data generation. We identify a technical challenge of distribution mismatch that arises in this application, and propose a novel estimation mechanism to resolve it. We run our algorithm on several off-the-shelf diffusion image generative models, including the recently released Stable Diffusion trained on the large-scale LAION dataset.*
13
 
 
14
 
15
  <a href="https://arxiv.org/abs/2212.00774"><img src="https://img.shields.io/badge/arXiv-2212.00774-b31b1b.svg" height=22.5></a>
16
+ <a href="https://colab.research.google.com/drive/1zixo66UYGl70VOPy053o7IV_YkQt5lCZ?usp=sharing"><img src="https://colab.research.google.com/assets/colab-badge.svg" height=22.5></a>
17
+ <a href="https://pals.ttic.edu/p/score-jacobian-chaining"><img src="https://img.shields.io/website?down_color=lightgrey&down_message=offline&label=Project%20Page&up_color=lightgreen&up_message=online&url=https%3A%2F%2Fpals.ttic.edu%2Fp%2Fscore-jacobian-chaining" height=22.5></a>
18
 
19
  <!-- [ [arxiv](https://arxiv.org/abs/2212.00774) | [project page](https://pals.ttic.edu/p/score-jacobian-chaining) | [colab](https://colab.research.google.com/drive/1zixo66UYGl70VOPy053o7IV_YkQt5lCZ?usp=sharing ) ] -->
20
 
21
  Many thanks to [dvschultz](https://github.com/dvschultz) for the colab.
22
 
23
+ ## Updates
24
+ - We have added subpixel rendering script for final high quality vis. The jittery videos you might have seen should be significantly better now. Please run `python /path/to/sjc/highres_final_vis.py` in the exp folder after the training is complete. There are a few toggles in the script you can play with, but the default is ok. It takes about 5 minutes / 11GB on an A5000, and the extra time is mainly due to SD Decoder.
25
+ - If you are running SJC with a DreamBooth fine-tuned model: the model's output distribution is already significantly narrowed. It might help to use a lower guidance scale `--sd.scale 50.0` for example. Intense mode-seeking is one cause for multi-face problem. We have internally tried DreamBooth with view-dependent prompt fine-tuning. But by and large DreamBooth integration is not ready.
26
+
27
+
28
+ ## TODOs
29
+ - [ ] make seeds configurable. So far all seeds are hardcoded to 0.
30
+ - [ ] add script to reproduce 2D experiments in Fig 4. The Fig might need change once it's tied to seeds. Note that for a simple aligned domain like faces, simple scheduling like using a single σ=1.5 could already generate some nice images. But not so for bedrooms; it's too diverse and annealing seems still needed.
31
+ - [ ] main paper figures did not use subpix rendering; appendix figures did. Replace the main paper figures to make them consistent.
32
+
33
  ## License
34
+ Since we use Stable Diffusion, we are releasing under their OpenRAIL license. Otherwise we do not
35
+ identify any components or upstream code that carry restrictive licensing requirements.
36
 
37
+ ## Structure
38
+ In addition to SJC, the repo also contains an implementation of [Karras sampler](https://arxiv.org/abs/2206.00364),
39
+ and a customized, simple voxel nerf. We provide the abstract parent class based on Karras et. al. and include
40
+ a few types of diffusion model here. See adapt.py.
41
 
42
  ## Installation
43
 
 
55
 
56
  ## Downloading checkpoints
57
  We have bundled a minimal set of things you need to download (SD v1.5 ckpt, gddpm ckpt for LSUN and FFHQ)
58
+ in a tar file, made available at our download server [here](https://dl.ttic.edu/pals/sjc/release.tar).
59
+ It is a single file of 12GB, and you can use wget or curl.
60
 
61
  Remember to __update__ `env.json` to point at the new checkpoint root where you have uncompressed the files.
62
 
 
66
  mkdir exp
67
  cd exp
68
  ```
69
+ Run the following command to generate a new 3D asset. It takes about 25 minutes / 10GB GPU mem on a single A5000 GPU for 10000 steps of optimization.
70
  ```bash
71
  python /path/to/sjc/run_sjc.py \
72
  --sd.prompt "A zoomed out high quality photo of Temple of Heaven" \
 
95
 
96
  `depth_weight` the weighting factor of the center depth loss
97
 
98
+ `var_red` whether to use Eq. 16 vs Eq. 15. For some prompts such as Obama we actually see better results with Eq. 15.
99
 
100
  Visualization results are stored in the current directory. In directories named `test_*` there are images (under `view`) and videos (under `view_seq`) rendered at different iterations.
101
 
102
 
 
 
 
 
103
  ## To Reproduce the Results in the Paper
104
  First create a clean directory for your experiment, then run one of the following scripts from that folder:
105
  ### Trump
 
205
  ```
206
  python /path/to/sjc/run_nerf.py
207
  ```
208
+ Our bundle contains a tar ball for the lego bulldozer dataset. Untar it and it will work.
209
 
210
  ## To Sample 2D images with the Karras Sampler
211
  ```
212
  python /path/to/sjc/run_img_sampling.py
213
  ```
214
+ Use help -h to see the options available. Will expand the details later.
215
 
216
 
217
+ ## Bib
218
  ```
219
  @article{sjc,
220
+ title={Score Jacobian Chaining: Lifting Pretrained 2D Diffusion Models for 3D Generation},
221
  author={Wang, Haochen and Du, Xiaodan and Li, Jiahao and Yeh, Raymond A. and Shakhnarovich, Greg},
222
  journal={arXiv preprint arXiv:2212.00774},
223
  year={2022},
README.md CHANGED
@@ -10,4 +10,12 @@ pinned: false
10
  license: creativeml-openrail-m
11
  ---
12
 
13
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
10
  license: creativeml-openrail-m
11
  ---
12
 
13
+ ## Bib
14
+ ```
15
+ @article{sjc,
16
+ title={Score Jacobian Chaining: Lifting Pretrained 2D Diffusion Models for 3D Generation},
17
+ author={Wang, Haochen and Du, Xiaodan and Li, Jiahao and Yeh, Raymond A. and Shakhnarovich, Greg},
18
+ journal={arXiv preprint arXiv:2212.00774},
19
+ year={2022},
20
+ }
21
+ ```
app.py CHANGED
@@ -16,6 +16,7 @@ from voxnerf.utils import every
16
  from voxnerf.vis import stitch_vis, bad_vis as nerf_vis
17
 
18
  from run_sjc import render_one_view, tsr_stats
 
19
 
20
  import gradio as gr
21
  import gc
@@ -167,22 +168,25 @@ with gr.Blocks(css=css) as demo:
167
 
168
  # TODO: Save Checkpoint
169
  with torch.no_grad():
 
 
170
  ckpt = vox.state_dict()
171
  H, W = poser.H, poser.W
172
  vox.eval()
173
- K, poses = poser.sample_test(100)
 
 
174
 
175
  aabb = vox.aabb.T.cpu().numpy()
176
  vox = vox.to(device_glb)
177
 
178
  num_imgs = len(poses)
179
-
180
  all_images = []
181
 
182
  for i in (pbar := tqdm(range(num_imgs))):
183
 
184
  pose = poses[i]
185
- y, depth = render_one_view(vox, aabb, H, W, K, pose)
186
  if isinstance(model, StableDiffusion):
187
  y = model.decode(y)
188
  pane, img, depth = vis_routine(y, depth)
 
16
  from voxnerf.vis import stitch_vis, bad_vis as nerf_vis
17
 
18
  from run_sjc import render_one_view, tsr_stats
19
+ from highres_final_vis import highres_render_one_view
20
 
21
  import gradio as gr
22
  import gc
 
168
 
169
  # TODO: Save Checkpoint
170
  with torch.no_grad():
171
+ n_frames=200
172
+ factor=4
173
  ckpt = vox.state_dict()
174
  H, W = poser.H, poser.W
175
  vox.eval()
176
+ K, poses = poser.sample_test(n_frames)
177
+ del n_frames
178
+ poses = poses[60:] # skip the full overhead view; not interesting
179
 
180
  aabb = vox.aabb.T.cpu().numpy()
181
  vox = vox.to(device_glb)
182
 
183
  num_imgs = len(poses)
 
184
  all_images = []
185
 
186
  for i in (pbar := tqdm(range(num_imgs))):
187
 
188
  pose = poses[i]
189
+ y, depth = highres_render_one_view(vox, aabb, H, W, K, pose, f=factor)
190
  if isinstance(model, StableDiffusion):
191
  y = model.decode(y)
192
  pane, img, depth = vis_routine(y, depth)
highres_final_vis.py ADDED
@@ -0,0 +1,124 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import numpy as np
2
+ import torch
3
+ from einops import rearrange
4
+
5
+ from voxnerf.render import subpixel_rays_from_img
6
+
7
+ from run_sjc import (
8
+ SJC, ScoreAdapter, StableDiffusion,
9
+ tqdm, EventStorage, HeartBeat, EarlyLoopBreak, get_event_storage, get_heartbeat, optional_load_config, read_stats,
10
+ vis_routine, stitch_vis, latest_ckpt,
11
+ scene_box_filter, render_ray_bundle, as_torch_tsrs,
12
+ device_glb
13
+ )
14
+
15
+
16
+ # the SD deocder is very memory hungry; the latent image cannot be too large
17
+ # for a graphics card with < 12 GB memory, set this to 128; quality already good
18
+ # if your card has 12 to 24 GB memory, you can set this to 200;
19
+ # but visually it won't help beyond a certain point. Our teaser is done with 128.
20
+ decoder_bottleneck_hw = 128
21
+
22
+
23
+ def final_vis():
24
+ cfg = optional_load_config(fname="full_config.yml")
25
+ assert len(cfg) > 0, "can't find cfg file"
26
+ mod = SJC(**cfg)
27
+
28
+ family = cfg.pop("family")
29
+ model: ScoreAdapter = getattr(mod, family).make()
30
+ vox = mod.vox.make()
31
+ poser = mod.pose.make()
32
+
33
+ pbar = tqdm(range(1))
34
+
35
+ with EventStorage(), HeartBeat(pbar):
36
+ ckpt_fname = latest_ckpt()
37
+ state = torch.load(ckpt_fname, map_location="cpu")
38
+ vox.load_state_dict(state)
39
+ vox.to(device_glb)
40
+
41
+ with EventStorage("highres"):
42
+ # what dominates the speed is NOT the factor here.
43
+ # you can try from 2 to 8, and the speed is about the same.
44
+ # the dominating factor in the pipeline I believe is the SD decoder.
45
+ evaluate(model, vox, poser, n_frames=200, factor=4)
46
+
47
+
48
+ @torch.no_grad()
49
+ def evaluate(score_model, vox, poser, n_frames=200, factor=4):
50
+ H, W = poser.H, poser.W
51
+ vox.eval()
52
+ K, poses = poser.sample_test(n_frames)
53
+ del n_frames
54
+ poses = poses[60:] # skip the full overhead view; not interesting
55
+
56
+ fuse = EarlyLoopBreak(5)
57
+ metric = get_event_storage()
58
+ hbeat = get_heartbeat()
59
+
60
+ aabb = vox.aabb.T.cpu().numpy()
61
+ vox = vox.to(device_glb)
62
+
63
+ num_imgs = len(poses)
64
+
65
+ for i in (pbar := tqdm(range(num_imgs))):
66
+ if fuse.on_break():
67
+ break
68
+
69
+ pose = poses[i]
70
+ y, depth = highres_render_one_view(vox, aabb, H, W, K, pose, f=factor)
71
+ if isinstance(score_model, StableDiffusion):
72
+ y = score_model.decode(y)
73
+ vis_routine(metric, y, depth)
74
+
75
+ metric.step()
76
+ hbeat.beat()
77
+
78
+ metric.flush_history()
79
+
80
+ metric.put_artifact(
81
+ "movie_im_and_depth", ".mp4",
82
+ lambda fn: stitch_vis(fn, read_stats(metric.output_dir, "view")[1])
83
+ )
84
+
85
+ metric.put_artifact(
86
+ "movie_im_only", ".mp4",
87
+ lambda fn: stitch_vis(fn, read_stats(metric.output_dir, "img")[1])
88
+ )
89
+
90
+ metric.step()
91
+
92
+
93
+ def highres_render_one_view(vox, aabb, H, W, K, pose, f=4):
94
+ bs = 4096
95
+
96
+ ro, rd = subpixel_rays_from_img(H, W, K, pose, f=f)
97
+ ro, rd, t_min, t_max = scene_box_filter(ro, rd, aabb)
98
+ n = len(ro)
99
+ ro, rd, t_min, t_max = as_torch_tsrs(vox.device, ro, rd, t_min, t_max)
100
+
101
+ rgbs = torch.zeros(n, 4, device=vox.device)
102
+ depth = torch.zeros(n, 1, device=vox.device)
103
+
104
+ with torch.no_grad():
105
+ for i in range(int(np.ceil(n / bs))):
106
+ s = i * bs
107
+ e = min(n, s + bs)
108
+ _rgbs, _depth, _ = render_ray_bundle(
109
+ vox, ro[s:e], rd[s:e], t_min[s:e], t_max[s:e]
110
+ )
111
+ rgbs[s:e] = _rgbs
112
+ depth[s:e] = _depth
113
+
114
+ rgbs = rearrange(rgbs, "(h w) c -> 1 c h w", h=H*f, w=W*f)
115
+ depth = rearrange(depth, "(h w) 1 -> h w", h=H*f, w=W*f)
116
+ rgbs = torch.nn.functional.interpolate(
117
+ rgbs, (decoder_bottleneck_hw, decoder_bottleneck_hw),
118
+ mode='bilinear', antialias=True
119
+ )
120
+ return rgbs, depth
121
+
122
+
123
+ if __name__ == "__main__":
124
+ final_vis()
voxnerf/vox.py CHANGED
@@ -169,9 +169,6 @@ class VoxRF(nn.Module):
169
 
170
  @VOXRF_REGISTRY.register()
171
  class V_SJC(VoxRF):
172
- """
173
- For SJC, when sampling density σ, add a gaussian ball offset
174
- """
175
  def __init__(self, *args, **kwargs):
176
  super().__init__(*args, **kwargs)
177
  # rendering color in [-1, 1] range, since score models all operate on centered img
 
169
 
170
  @VOXRF_REGISTRY.register()
171
  class V_SJC(VoxRF):
 
 
 
172
  def __init__(self, *args, **kwargs):
173
  super().__init__(*args, **kwargs)
174
  # rendering color in [-1, 1] range, since score models all operate on centered img