Spaces:
Running
on
Zero
Running
on
Zero
## Important configuration options for [inference.py](../inference.py): | |
### 1. General configs | |
| Configuration | default | Explanation | | |
|:------------- |:----- | :------------- | | |
| `--image_dir` | './test/images/fruit.png' | Image file path | | |
| `--out_dir` | './output' | Output directory | | |
| `--device` | 'cuda:0' | The device to use | | |
| `--exp_name` | None | Experiment name, use image file name by default | | |
### 2. Point cloud render configs | |
#### The definition of world coordinate system and tips for adjusting point cloud render configs are illustrated in [render document](./render_help.md). | |
| Configuration | default | Explanation | | |
|:------------- |:----- | :------------- | | |
| `--mode` | 'single_view_txt' | Currently we support 'single_view_txt' and 'single_view_target' | | |
| `--traj_txt` | None | Required for 'single_view_txt' mode, a txt file that specify camera trajectory | | |
| `--elevation` | 5. | The elevation angle of the input image in degree. Estimate a rough value based on your visual judgment | | |
| `--center_scale` | 1. | Scale factor for the spherical radius (r). By default, r is set to the depth value of the center pixel (H//2, W//2) of the reference image | | |
| `--d_theta` | 10. | Required for 'single_view_target' mode, specify target theta angle as (theta + d_theta) | | |
| `--d_phi` | 30. | Required for 'single_view_target' mode, specify target phi angle as (phi + d_phi) | | |
| `--d_r` | -.2 | Required for 'single_view_target' mode, specify target radius as (r + r*dr) | | |
### 3. Diffusion configs | |
| Configuration | default | Explanation | | |
|:------------- |:----- | :------------- | | |
| `--ckpt_path` | './checkpoints/ViewCrafter_25.ckpt' | Checkpoint path | | |
| `--config` | './configs/inference_pvd_1024.yaml' | Config (yaml) path | | |
| `--ddim_steps` | 50 | Steps of ddim if positive, otherwise use DDPM, reduce to 10 to speed up inference | | |
| `--ddim_eta` | 1.0 | Eta for ddim sampling (0.0 yields deterministic sampling) | | |
| `--bs` | 1 | Batch size for inference, should be one | | |
| `--height` | 576 | Image height, in pixel space | | |
| `--width` | 1024 | Image width, in pixel space | | |
| `--frame_stride` | 10 | Fixed | | |
| `--unconditional_guidance_scale` | 7.5 | Prompt classifier-free guidance | | |
| `--seed` | 123 | Seed for seed_everything | | |
| `--video_length` | 25 | Inference video length, change to 16 if you use 16 frame model | | |
| `--negative_prompt` | False | Unused | | |
| `--text_input` | False | Unused | | |
| `--prompt` | 'Rotating view of a scene' | Fixed | | |
| `--multiple_cond_cfg` | False | Use multi-condition cfg or not | | |
| `--cfg_img` | None | Guidance scale for image conditioning | | |
| `--timestep_spacing` | "uniform_trailing" | The way the timesteps should be scaled. Refer to Table 2 of the [Common Diffusion Noise Schedules and Sample Steps are Flawed](https://huggingface.co/papers/2305.08891) for more information. | | |
| `--guidance_rescale` | 0.7 | Guidance rescale in [Common Diffusion Noise Schedules and Sample Steps are Flawed](https://huggingface.co/papers/2305.08891) | | |
| `--perframe_ae` | True | If we use per-frame AE decoding, set it to True to save GPU memory, especially for the model of 576x1024 | | |
| `--n_samples` | 1 | Num of samples per prompt | | |