Browse files
- assets/40_prompt_images/A 3D scan of AK47, weapon.jpeg +0 -0
- assets/40_prompt_images/A DSLR photo of Sydney Opera House.jpg +0 -0
- assets/40_prompt_images/A bald eagle carved out of wood.jpg +0 -0
- assets/40_prompt_images/A bulldog wearing a black pirate hat.jpeg +0 -0
- assets/40_prompt_images/A crab, low poly.jpg +0 -0
- assets/40_prompt_images/A photo of a horse walking.jpeg +0 -0
- assets/40_prompt_images/A pig wearing a backpack.jpeg +0 -0
- assets/40_prompt_images/A product photo of a toy tank.jpg +0 -0
- assets/40_prompt_images/A see no evil monkey on a kick drum.jpg +0 -0
- assets/40_prompt_images/A statue of angel, blender.jpg +0 -0
- assets/40_prompt_images/Corgi riding a rocket.jpeg +0 -0
- assets/40_prompt_images/Daenerys Targaryen from game of throne.jpg +0 -0
- assets/40_prompt_images/Darth Vader helmet,g highly detailed.jpg +0 -0
- assets/40_prompt_images/Dragon armor.jpeg +0 -0
- assets/40_prompt_images/Fisherman House, cute, cartoon, blender, stylized.jpg +0 -0
- assets/40_prompt_images/Flying Dragon, highly detailed, breathing fire.jpeg +0 -0
- assets/40_prompt_images/Handpainted watercolor windmill, hand-painted.jpg +0 -0
- assets/40_prompt_images/Katana.jpeg +0 -0
- assets/40_prompt_images/Little italian town, hand-painted style.jpg +0 -0
- assets/40_prompt_images/Mr Bean Cartoon doing a T Pose.jpg +0 -0
- assets/40_prompt_images/Pedestal Fan (White).jpeg +0 -0
- assets/40_prompt_images/Pikachu with hat.jpg +0 -0
- assets/40_prompt_images/Samurai koala bear.jpg +0 -0
- assets/40_prompt_images/TRUMP figure.jpg +0 -0
- assets/40_prompt_images/Viking axe, fantasy, weapon, blender, 8k, HD.jpg +0 -0
- assets/40_prompt_images/a DSLR photo of a frog wearing a sweater.jpg +0 -0
- assets/40_prompt_images/a DSLR photo of a ghost eating a hamburger.jpg +0 -0
- assets/40_prompt_images/a DSLR photo of a peacock on a surfboard.jpeg +0 -0
- assets/40_prompt_images/a DSLR photo of a squirrel playing guitar.jpg +0 -0
- assets/40_prompt_images/a DSLR photo of an eggshell broken in two with an adorable chick standing next to it.jpeg +0 -0
- assets/40_prompt_images/an astronaut riding a horse.jpeg +0 -0
- assets/40_prompt_images/animal skull pile.jpg +0 -0
- assets/40_prompt_images/army Jacket, 3D scan.jpg +0 -0
- assets/40_prompt_images/baby yoda in the style of Mormookiee.jpg +0 -0
- assets/40_prompt_images/beautiful, intricate butterfly.jpg +0 -0
- assets/40_prompt_images/girl riding wolf, cute, cartoon, blender.jpg +0 -0
- assets/40_prompt_images/mecha vampire girl chibi.jpg +0 -0
- assets/40_prompt_images/military Mech, future, scifi.jpg +0 -0
- assets/40_prompt_images/motorcycle, scifi, blender.jpeg +0 -0
- assets/40_prompt_images/saber from fate stay night, 3D, girl, anime.jpeg +0 -0
- +25 -0
- lrm/ +5 -0
- lrm/ +138 -0
- lrm/ +232 -0
- lrm/models/ +5 -0
# Code of Conduct
## Our Pledge
In the interest of fostering an open and welcoming environment, we as
7 |
size, disability, ethnicity, sex characteristics, gender identity and expression,
10 |
12 |
## Our Standards
Examples of behavior that contributes to creating a positive environment
17 |
* Using welcoming and inclusive language
* Gracefully accepting constructive criticism
* Showing empathy towards other community members
Examples of unacceptable behavior by participants include:
* The use of sexualized language or imagery and unwelcome sexual attention or
* Trolling, insulting/derogatory comments, and personal or political attacks
29 |
address, without explicit permission
32 |
34 |
36 |
behavior and are expected to take appropriate and fair corrective action in
39 |
41 |
that are not aligned to this Code of Conduct, or to ban temporarily or
44 |
46 |
## Scope
This Code of Conduct applies within all project spaces, and it also applies when
50 |
project e-mail address, posting via an official social media account, or acting
53 |
55 |
reasonable belief that an individual's behavior may have a negative impact on
58 |
60 |
62 |
complaints will be reviewed and investigated and will result in a response that
65 |
Further details of specific enforcement policies may be posted separately.
Project maintainers who do not follow or enforce the Code of Conduct in good
70 |
72 |
74 |
available at
78 |
80 |
# Contributing to PoseDiffusion
3 |
## Pull Requests
7 |
9 |
3. If you've changed APIs, update the documentation.
12 |
6. If you haven't already, complete the Contributor License Agreement ("CLA").
## Contributor License Agreement ("CLA")
16 |
to do this once to work on any of Facebook's open source projects.
Complete your CLA here: <>
## Issues
22 |
clear and has sufficient instructions to be able to reproduce the issue.
Facebook has a [bounty program]( for the safe
26 |
outlined on that page and do not file a public issue.
## License
30 |
under the LICENSE file in the root directory of this source tree.
1 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
19 |
20 |
21 |
1 |
# [ECCV 2024] VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion Models
2 |
3 |
[Porject page](, [Paper link](
4 |
5 |
VFusion3D is a large, feed-forward 3D generative model trained with a small amount of 3D data and a large volume of synthetic multi-view data. It is the first work exploring scalable 3D generative/reconstruction models as a step towards a 3D foundation.
6 |
7 |
[VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion Models](<br>
8 |
[Junlin Han](, [Filippos Kokkinos](, [Philip Torr](<br>
9 |
GenAI, Meta and TVG, University of Oxford<br>
10 |
European Conference on Computer Vision (ECCV), 2024
11 |
12 |
13 |
## News
14 |
15 |
- [25.07.2024] Release weights and inference code for VFusion3D.
16 |
17 |
## Results and Comparisons
18 |
19 |
### 3D Generation Results
20 |
<img src='images/gif1.gif' width=950>
21 |
22 |
<img src='images/gif2.gif' width=950>
23 |
24 |
### User Study Results
25 |
<img src='images/user.png' width=950>
26 |
27 |
28 |
## Setup
29 |
30 |
### Installation
31 |
32 |
git clone
33 |
cd vfusion3d
34 |
35 |
36 |
### Environment
37 |
We provide a simple installation script that, by default, sets up a conda environment with Python 3.8.19, PyTorch 2.3, and CUDA 12.1. Similar package versions should also work.
38 |
39 |
40 |
41 |
42 |
43 |
## Quick Start
44 |
45 |
### Pretrained Models
46 |
47 |
- Model weights are available here [Google Drive]( Please download it and put it inside ./checkpoints/
48 |
49 |
50 |
### Prepare Images
51 |
- We put some sample inputs under `assets/40_prompt_images`, which is the 40 MVDream prompt images used in the paper. Results of them are also provided under `results/40_prompt_images_provided`.
52 |
53 |
### Inference
54 |
- Run the inference script to get 3D assets.
55 |
- You may specify which form of output to generate by setting the flags `--export_video` and `--export_mesh`.
56 |
- Change `--source_path` and `--dump_path` if you want to run it on other image folders.
57 |
58 |
59 |
# Example usages
60 |
# Render a video
61 |
python -m lrm.inferrer --export_video --resume ./checkpoints/vfusion3dckpt
62 |
63 |
# Export mesh
64 |
python -m lrm.inferrer --export_mesh --resume ./checkpoints/vfusion3dckpt
65 |
66 |
67 |
68 |
## Acknowledgement
69 |
70 |
- This inference code of VFusion3D heavily borrows from [OpenLRM](
71 |
72 |
## Citation
73 |
74 |
If you find this work useful, please cite us:
75 |
76 |
77 |
78 |
79 |
title={VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion Models},
80 |
author={Junlin Han and Filippos Kokkinos and Philip Torr},
81 |
journal={European Conference on Computer Vision (ECCV)},
82 |
83 |
84 |
85 |
86 |
## License
87 |
88 |
- The majority of VFusion3D is licensed under CC-BY-NC, however portions of the project are available under separate license terms: OpenLRM as a whole is licensed under the Apache License, Version 2.0, while certain components are covered by NVIDIA's proprietary license.
89 |
- The model weights of VFusion3D is also licensed under CC-BY-NC.
1 |
from .modeling import LRMGenerator, LRMGeneratorConfig
1 |
# Copyright (c) Meta Platforms, Inc. and affiliates.
2 |
# All rights reserved.
3 |
4 |
# This source code is licensed under the license found in the
5 |
# LICENSE file in the root directory of this source tree.
6 |
7 |
# This Script Assumes Python 3.8.19, CUDA 12.1. Similar package versions might still work but they are not tested.
8 |
9 |
conda deactivate
10 |
11 |
# Set environment variables
12 |
export ENV_NAME=vfusion3d
13 |
export PYTHON_VERSION=3.8.19
14 |
export CUDA_VERSION=12.1
15 |
16 |
# Create a new conda environment and activate it
17 |
conda create -n $ENV_NAME python=$PYTHON_VERSION
18 |
conda activate $ENV_NAME
19 |
conda install pytorch=2.3.0 torchvision==0.18.0 pytorch-cuda=$CUDA_VERSION -c pytorch -c nvidia
20 |
pip install transformers
21 |
pip install imageio[ffmpeg]
22 |
pip install PyMCubes
23 |
pip install trimesh
24 |
pip install rembg[gpu,cli]
25 |
pip install kiui
1 |
# Copyright (c) Meta Platforms, Inc. and affiliates.
2 |
# All rights reserved.
3 |
4 |
# This source code is licensed under the license found in the
5 |
# LICENSE file in the root directory of this source tree.
1 |
# Copyright (c) Meta Platforms, Inc. and affiliates.
2 |
# All rights reserved.
3 |
4 |
# This source code is licensed under the license found in the
5 |
# LICENSE file in the root directory of this source tree.
6 |
7 |
8 |
import torch
9 |
import numpy as np
10 |
import math
11 |
12 |
13 |
R: (N, 3, 3)
14 |
T: (N, 3)
15 |
E: (N, 4, 4)
16 |
vector: (N, 3)
17 |
18 |
19 |
20 |
def compose_extrinsic_R_T(R: torch.Tensor, T: torch.Tensor):
21 |
22 |
Compose the standard form extrinsic matrix from R and T.
23 |
Batched I/O.
24 |
25 |
RT =, T.unsqueeze(-1)), dim=-1)
26 |
return compose_extrinsic_RT(RT)
27 |
28 |
29 |
def compose_extrinsic_RT(RT: torch.Tensor):
30 |
31 |
Compose the standard form extrinsic matrix from RT.
32 |
Batched I/O.
33 |
34 |
35 |
36 |
torch.tensor([[[0, 0, 0, 1]]], dtype=torch.float32).repeat(RT.shape[0], 1, 1).to(RT.device)
37 |
], dim=1)
38 |
39 |
40 |
def decompose_extrinsic_R_T(E: torch.Tensor):
41 |
42 |
Decompose the standard extrinsic matrix into R and T.
43 |
Batched I/O.
44 |
45 |
RT = decompose_extrinsic_RT(E)
46 |
return RT[:, :, :3], RT[:, :, 3]
47 |
48 |
49 |
def decompose_extrinsic_RT(E: torch.Tensor):
50 |
51 |
Decompose the standard extrinsic matrix into RT.
52 |
Batched I/O.
53 |
54 |
return E[:, :3, :]
55 |
56 |
57 |
def get_normalized_camera_intrinsics(intrinsics: torch.Tensor):
58 |
59 |
intrinsics: (N, 3, 2), [[fx, fy], [cx, cy], [width, height]]
60 |
Return batched fx, fy, cx, cy
61 |
62 |
fx, fy = intrinsics[:, 0, 0], intrinsics[:, 0, 1]
63 |
cx, cy = intrinsics[:, 1, 0], intrinsics[:, 1, 1]
64 |
width, height = intrinsics[:, 2, 0], intrinsics[:, 2, 1]
65 |
fx, fy = fx / width, fy / height
66 |
cx, cy = cx / width, cy / height
67 |
return fx, fy, cx, cy
68 |
69 |
70 |
def build_camera_principle(RT: torch.Tensor, intrinsics: torch.Tensor):
71 |
72 |
RT: (N, 3, 4)
73 |
intrinsics: (N, 3, 2), [[fx, fy], [cx, cy], [width, height]]
74 |
75 |
fx, fy, cx, cy = get_normalized_camera_intrinsics(intrinsics)
76 |
77 |
RT.reshape(-1, 12),
78 |
fx.unsqueeze(-1), fy.unsqueeze(-1), cx.unsqueeze(-1), cy.unsqueeze(-1),
79 |
], dim=-1)
80 |
81 |
82 |
def build_camera_standard(RT: torch.Tensor, intrinsics: torch.Tensor):
83 |
84 |
RT: (N, 3, 4)
85 |
intrinsics: (N, 3, 2), [[fx, fy], [cx, cy], [width, height]]
86 |
87 |
E = compose_extrinsic_RT(RT)
88 |
fx, fy, cx, cy = get_normalized_camera_intrinsics(intrinsics)
89 |
I = torch.stack([
90 |
torch.stack([fx, torch.zeros_like(fx), cx], dim=-1),
91 |
torch.stack([torch.zeros_like(fy), fy, cy], dim=-1),
92 |
torch.tensor([[0, 0, 1]], dtype=torch.float32, device=RT.device).repeat(RT.shape[0], 1),
93 |
], dim=1)
94 |
95 |
E.reshape(-1, 16),
96 |
I.reshape(-1, 9),
97 |
], dim=-1)
98 |
99 |
100 |
def center_looking_at_camera_pose(camera_position: torch.Tensor, look_at: torch.Tensor = None, up_world: torch.Tensor = None):
101 |
102 |
camera_position: (M, 3)
103 |
look_at: (3)
104 |
up_world: (3)
105 |
return: (M, 3, 4)
106 |
107 |
# by default, looking at the origin and world up is pos-z
108 |
if look_at is None:
109 |
look_at = torch.tensor([0, 0, 0], dtype=torch.float32)
110 |
if up_world is None:
111 |
up_world = torch.tensor([0, 0, 1], dtype=torch.float32)
112 |
look_at = look_at.unsqueeze(0).repeat(camera_position.shape[0], 1)
113 |
up_world = up_world.unsqueeze(0).repeat(camera_position.shape[0], 1)
114 |
115 |
z_axis = camera_position - look_at
116 |
z_axis = z_axis / z_axis.norm(dim=-1, keepdim=True)
117 |
x_axis = torch.cross(up_world, z_axis)
118 |
x_axis = x_axis / x_axis.norm(dim=-1, keepdim=True)
119 |
y_axis = torch.cross(z_axis, x_axis)
120 |
y_axis = y_axis / y_axis.norm(dim=-1, keepdim=True)
121 |
extrinsics = torch.stack([x_axis, y_axis, z_axis, camera_position], dim=-1)
122 |
return extrinsics
123 |
124 |
def get_surrounding_views(M, radius, elevation):
125 |
# convert spherical coordinates (radius, azimuth, elevation) to Cartesian coordinates (x, y, z).
126 |
camera_positions = []
127 |
rand_theta= np.random.uniform(0, np.pi/180)
128 |
elevation = math.radians(elevation)
129 |
for i in range(M):
130 |
theta = 2 * math.pi * i / M + rand_theta
131 |
x = radius * math.cos(theta) * math.cos(elevation)
132 |
y = radius * math.sin(theta) * math.cos(elevation)
133 |
z = radius * math.sin(elevation)
134 |
camera_positions.append([x, y, z])
135 |
camera_positions = torch.tensor(camera_positions, dtype=torch.float32)
136 |
extrinsics = center_looking_at_camera_pose(camera_positions)
137 |
138 |
return extrinsics
1 |
import torch
2 |
import math
3 |
import os
4 |
import imageio
5 |
import mcubes
6 |
import trimesh
7 |
import numpy as np
8 |
import argparse
9 |
from torchvision.utils import save_image
10 |
from PIL import Image
11 |
import glob
12 |
from .models.generator import LRMGenerator # Make sure this import is correct
13 |
from .cam_utils import build_camera_principle, build_camera_standard, center_looking_at_camera_pose # Make sure this import is correct
14 |
from functools import partial
15 |
from rembg import remove, new_session
16 |
from kiui.op import recenter
17 |
import kiui
18 |
19 |
class LRMInferrer:
20 |
def __init__(self, model_name: str, resume: str):
21 |
print("Initializing LRMInferrer")
22 |
self.device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
23 |
_model_kwargs = {'camera_embed_dim': 1024, 'rendering_samples_per_ray': 128, 'transformer_dim': 1024, 'transformer_layers': 16, 'transformer_heads': 16, 'triplane_low_res': 32, 'triplane_high_res': 64, 'triplane_dim': 80, 'encoder_freeze': False}
24 |
25 |
self.model = self._build_model(_model_kwargs).eval().to(self.device)
26 |
checkpoint = torch.load(resume, map_location='cpu')
27 |
state_dict = checkpoint['model_state_dict']
28 |
29 |
del checkpoint, state_dict
30 |
31 |
32 |
def __enter__(self):
33 |
print("Entering context")
34 |
return self
35 |
36 |
def __exit__(self, exc_type, exc_val, exc_tb):
37 |
print("Exiting context")
38 |
if exc_type:
39 |
print(f"Exception type: {exc_type}")
40 |
print(f"Exception value: {exc_val}")
41 |
print(f"Traceback: {exc_tb}")
42 |
43 |
def _build_model(self, model_kwargs):
44 |
print("Building model")
45 |
model = LRMGenerator(**model_kwargs).to(self.device)
46 |
print("Loaded model from checkpoint")
47 |
return model
48 |
49 |
50 |
def get_surrounding_views(M, radius, elevation):
51 |
camera_positions = []
52 |
rand_theta = np.random.uniform(0, np.pi/180)
53 |
elevation = math.radians(elevation)
54 |
for i in range(M):
55 |
theta = 2 * math.pi * i / M + rand_theta
56 |
x = radius * math.cos(theta) * math.cos(elevation)
57 |
y = radius * math.sin(theta) * math.cos(elevation)
58 |
z = radius * math.sin(elevation)
59 |
camera_positions.append([x, y, z])
60 |
camera_positions = torch.tensor(camera_positions, dtype=torch.float32)
61 |
extrinsics = center_looking_at_camera_pose(camera_positions)
62 |
return extrinsics
63 |
64 |
65 |
def _default_intrinsics():
66 |
fx = fy = 384
67 |
cx = cy = 256
68 |
w = h = 512
69 |
intrinsics = torch.tensor([
70 |
[fx, fy],
71 |
[cx, cy],
72 |
[w, h],
73 |
], dtype=torch.float32)
74 |
return intrinsics
75 |
76 |
def _default_source_camera(self, batch_size: int = 1):
77 |
dist_to_center = 1.5
78 |
canonical_camera_extrinsics = torch.tensor([[
79 |
[0, 0, 1, 1],
80 |
[1, 0, 0, 0],
81 |
[0, 1, 0, 0],
82 |
]], dtype=torch.float32)
83 |
canonical_camera_intrinsics = self._default_intrinsics().unsqueeze(0)
84 |
source_camera = build_camera_principle(canonical_camera_extrinsics, canonical_camera_intrinsics)
85 |
return source_camera.repeat(batch_size, 1)
86 |
87 |
def _default_render_cameras(self, batch_size: int = 1):
88 |
render_camera_extrinsics = self.get_surrounding_views(160, 1.5, 0)
89 |
render_camera_intrinsics = self._default_intrinsics().unsqueeze(0).repeat(render_camera_extrinsics.shape[0], 1, 1)
90 |
render_cameras = build_camera_standard(render_camera_extrinsics, render_camera_intrinsics)
91 |
return render_cameras.unsqueeze(0).repeat(batch_size, 1, 1)
92 |
93 |
94 |
def images_to_video(images, output_path, fps, verbose=False):
95 |
os.makedirs(os.path.dirname(output_path), exist_ok=True)
96 |
frames = []
97 |
for i in range(images.shape[0]):
98 |
frame = (images[i].permute(1, 2, 0).cpu().numpy() * 255).astype(np.uint8)
99 |
assert frame.shape[0] == images.shape[2] and frame.shape[1] == images.shape[3], \
100 |
f"Frame shape mismatch: {frame.shape} vs {images.shape}"
101 |
assert frame.min() >= 0 and frame.max() <= 255, \
102 |
f"Frame value out of range: {frame.min()} ~ {frame.max()}"
103 |
104 |
imageio.mimwrite(output_path, np.stack(frames), fps=fps)
105 |
if verbose:
106 |
print(f"Saved video to {output_path}")
107 |
108 |
def infer_single(self, image: torch.Tensor, render_size: int, mesh_size: int, export_video: bool, export_mesh: bool):
109 |
print("infer_single called")
110 |
mesh_thres = 1.0
111 |
chunk_size = 2
112 |
batch_size = 1
113 |
114 |
source_camera = self._default_source_camera(batch_size).to(self.device)
115 |
render_cameras = self._default_render_cameras(batch_size).to(self.device)
116 |
117 |
with torch.no_grad():
118 |
planes = self.model.forward(image, source_camera)
119 |
results = {}
120 |
121 |
if export_video:
122 |
print("Starting export_video")
123 |
frames = []
124 |
for i in range(0, render_cameras.shape[1], chunk_size):
125 |
print(f"Processing chunk {i} to {i + chunk_size}")
126 |
127 |
128 |
129 |
render_cameras[:, i:i+chunk_size],
130 |
131 |
132 |
133 |
134 |
135 |
136 |
frames = {
137 |
k:[r[k] for r in frames], dim=1)
138 |
for k in frames[0].keys()
139 |
140 |
141 |
'frames': frames,
142 |
143 |
print("Finished export_video")
144 |
145 |
if export_mesh:
146 |
print("Starting export_mesh")
147 |
grid_out = self.model.synthesizer.forward_grid(
148 |
149 |
150 |
151 |
vtx, faces = mcubes.marching_cubes(grid_out['sigma'].float().squeeze(0).squeeze(-1).cpu().numpy(), mesh_thres)
152 |
vtx = vtx / (mesh_size - 1) * 2 - 1
153 |
vtx_tensor = torch.tensor(vtx, dtype=torch.float32, device=self.device).unsqueeze(0)
154 |
vtx_colors = self.model.synthesizer.forward_points(planes, vtx_tensor)['rgb'].float().squeeze(0).cpu().numpy()
155 |
vtx_colors = (vtx_colors * 255).astype(np.uint8)
156 |
mesh = trimesh.Trimesh(vertices=vtx, faces=faces, vertex_colors=vtx_colors)
157 |
158 |
'mesh': mesh,
159 |
160 |
print("Finished export_mesh")
161 |
162 |
return results
163 |
164 |
def infer(self, source_image: str, dump_path: str, source_size: int, render_size: int, mesh_size: int, export_video: bool, export_mesh: bool):
165 |
print("infer called")
166 |
session = new_session("isnet-general-use")
167 |
rembg_remove = partial(remove, session=session)
168 |
image_name = os.path.basename(source_image)
169 |
uid = image_name.split('.')[0]
170 |
171 |
image = kiui.read_image(source_image, mode='uint8')
172 |
image = rembg_remove(image)
173 |
mask = rembg_remove(image, only_mask=True)
174 |
image = recenter(image, mask, border_ratio=0.20)
175 |
os.makedirs(dump_path, exist_ok=True)
176 |
177 |
image = torch.tensor(np.array(image)).permute(2, 0, 1).unsqueeze(0) / 255.0
178 |
if image.shape[1] == 4:
179 |
image = image[:, :3, ...] * image[:, 3:, ...] + (1 - image[:, 3:, ...])
180 |
image = torch.nn.functional.interpolate(image, size=(source_size, source_size), mode='bicubic', align_corners=True)
181 |
image = torch.clamp(image, 0, 1)
182 |
save_image(image, os.path.join(dump_path, f'{uid}.png'))
183 |
184 |
results = self.infer_single(
185 |
186 |
187 |
188 |
189 |
190 |
191 |
192 |
if 'frames' in results:
193 |
renderings = results['frames']
194 |
for k, v in renderings.items():
195 |
if k == 'images_rgb':
196 |
197 |
198 |
os.path.join(dump_path, f'{uid}.mp4'),
199 |
200 |
201 |
print(f"Export video success to {dump_path}")
202 |
203 |
if 'mesh' in results:
204 |
mesh = results['mesh']
205 |
mesh.export(os.path.join(dump_path, f'{uid}.obj'), 'obj')
206 |
207 |
if __name__ == '__main__':
208 |
parser = argparse.ArgumentParser()
209 |
parser.add_argument('--model_name', type=str, default='lrm-base-obj-v1')
210 |
parser.add_argument('--source_path', type=str, default='./assets/cat.png')
211 |
parser.add_argument('--dump_path', type=str, default='./results/single_image')
212 |
parser.add_argument('--source_size', type=int, default=512)
213 |
parser.add_argument('--render_size', type=int, default=384)
214 |
parser.add_argument('--mesh_size', type=int, default=512)
215 |
parser.add_argument('--export_video', action='store_true')
216 |
parser.add_argument('--export_mesh', action='store_true')
217 |
parser.add_argument('--resume', type=str, required=True, help='Path to a checkpoint to resume training from')
218 |
args = parser.parse_args()
219 |
220 |
with LRMInferrer(model_name=args.model_name, resume=args.resume) as inferrer:
221 |
with torch.autocast(device_type="cuda", cache_enabled=False, dtype=torch.float32):
222 |
print("Start inference for image:", args.source_path)
223 |
224 |
225 |
226 |
227 |
228 |
229 |
230 |
231 |
232 |
print("Finished inference for image:", args.source_path)
@@ -0,0 +1,5 @@
1 |
# Copyright (c) Meta Platforms, Inc. and affiliates.
2 |
# All rights reserved.
3 |
4 |
# This source code is licensed under the license found in the
5 |
# LICENSE file in the root directory of this source tree.