Spaces:
Running
on
Zero
Running
on
Zero
Logging to ./ | |
creating model and diffusion... | |
creating 3DAE... | |
length of vit_decoder.blocks: 24 | |
init pos_embed with sincos | |
length of vit_decoder.blocks: 24 | |
ignore dim_up_mlp: True | |
AE( | |
(encoder): MVEncoderGSDynamicInp( | |
(conv_in): Conv2d(10, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) | |
(down): ModuleList( | |
(0): Module( | |
(block): ModuleList( | |
(0): ResnetBlock( | |
(norm1): GroupNorm(32, 64, eps=1e-06, affine=True) | |
(conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) | |
(norm2): GroupNorm(32, 64, eps=1e-06, affine=True) | |
(dropout): Dropout(p=0.0, inplace=False) | |
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) | |
) | |
) | |
(attn): ModuleList() | |
(downsample): Downsample( | |
(conv): Conv2d(64, 64, kernel_size=(3, 3), stride=(2, 2)) | |
) | |
) | |
(1): Module( | |
(block): ModuleList( | |
(0): ResnetBlock( | |
(norm1): GroupNorm(32, 64, eps=1e-06, affine=True) | |
(conv1): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) | |
(norm2): GroupNorm(32, 128, eps=1e-06, affine=True) | |
(dropout): Dropout(p=0.0, inplace=False) | |
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) | |
(nin_shortcut): Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1)) | |
) | |
) | |
(attn): ModuleList() | |
(downsample): Downsample( | |
(conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2)) | |
) | |
) | |
(2): Module( | |
(block): ModuleList( | |
(0): ResnetBlock( | |
(norm1): GroupNorm(32, 128, eps=1e-06, affine=True) | |
(conv1): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) | |
(norm2): GroupNorm(32, 256, eps=1e-06, affine=True) | |
(dropout): Dropout(p=0.0, inplace=False) | |
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) | |
(nin_shortcut): Conv2d(128, 256, kernel_size=(1, 1), stride=(1, 1)) | |
) | |
) | |
(attn): ModuleList() | |
(downsample): Downsample( | |
(conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2)) | |
) | |
) | |
(3): Module( | |
(block): ModuleList( | |
(0): ResnetBlock( | |
(norm1): GroupNorm(32, 256, eps=1e-06, affine=True) | |
(conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) | |
(norm2): GroupNorm(32, 256, eps=1e-06, affine=True) | |
(dropout): Dropout(p=0.0, inplace=False) | |
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) | |
) | |
) | |
(attn): ModuleList() | |
) | |
) | |
(mid): Module( | |
(block_1): ResnetBlock( | |
(norm1): GroupNorm(32, 256, eps=1e-06, affine=True) | |
(conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) | |
(norm2): GroupNorm(32, 256, eps=1e-06, affine=True) | |
(dropout): Dropout(p=0.0, inplace=False) | |
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) | |
) | |
(attn_1): SpatialTransformer3D( | |
(norm): GroupNorm(32, 256, eps=1e-06, affine=True) | |
(proj_in): Conv2d(256, 512, kernel_size=(1, 1), stride=(1, 1)) | |
(transformer_blocks): ModuleList( | |
(0): BasicTransformerBlock3D( | |
(attn1): MemoryEfficientCrossAttention( | |
(to_q): Linear(in_features=512, out_features=512, bias=False) | |
(to_k): Linear(in_features=512, out_features=512, bias=False) | |
(q_norm): Identity() | |
(k_norm): Identity() | |
(to_v): Linear(in_features=512, out_features=512, bias=False) | |
(to_out): Sequential( | |
(0): Linear(in_features=512, out_features=512, bias=True) | |
(1): Dropout(p=0.0, inplace=False) | |
) | |
) | |
(ff): FeedForward( | |
(net): Sequential( | |
(0): GEGLU( | |
(proj): Linear(in_features=512, out_features=4096, bias=True) | |
) | |
(1): Dropout(p=0.0, inplace=False) | |
(2): Linear(in_features=2048, out_features=512, bias=True) | |
) | |
) | |
(attn2): MemoryEfficientCrossAttention( | |
(to_q): Linear(in_features=512, out_features=512, bias=False) | |
(to_k): Linear(in_features=512, out_features=512, bias=False) | |
(q_norm): Identity() | |
(k_norm): Identity() | |
(to_v): Linear(in_features=512, out_features=512, bias=False) | |
(to_out): Sequential( | |
(0): Linear(in_features=512, out_features=512, bias=True) | |
(1): Dropout(p=0.0, inplace=False) | |
) | |
) | |
(norm1): LayerNorm((512,), eps=1e-05, elementwise_affine=True) | |
(norm2): LayerNorm((512,), eps=1e-05, elementwise_affine=True) | |
(norm3): LayerNorm((512,), eps=1e-05, elementwise_affine=True) | |
) | |
) | |
(proj_out): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1)) | |
) | |
(block_2): ResnetBlock( | |
(norm1): GroupNorm(32, 256, eps=1e-06, affine=True) | |
(conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) | |
(norm2): GroupNorm(32, 256, eps=1e-06, affine=True) | |
(dropout): Dropout(p=0.0, inplace=False) | |
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) | |
) | |
) | |
(norm_out): GroupNorm(32, 256, eps=1e-06, affine=True) | |
(conv_out): Conv2d(256, 24, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) | |
) | |
(decoder): RodinSR_256_fusionv6_ConvQuant_liteSR_dinoInit3DAttn_SD_B_3L_C_withrollout_withSD_D_ditDecoder( | |
(superresolution): ModuleDict( | |
(ldm_upsample): PatchEmbedTriplane( | |
(proj): Conv2d(12, 3072, kernel_size=(2, 2), stride=(2, 2), groups=3) | |
(norm): Identity() | |
) | |
(quant_conv): Conv2d(24, 24, kernel_size=(1, 1), stride=(1, 1), groups=3) | |
(conv_sr): Decoder( | |
(conv_in): Conv2d(1024, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) | |
(mid): Module( | |
(block_1): ResnetBlock( | |
(norm1): GroupNorm(32, 128, eps=1e-06, affine=True) | |
(conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) | |
(norm2): GroupNorm(32, 128, eps=1e-06, affine=True) | |
(dropout): Dropout(p=0.0, inplace=False) | |
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) | |
) | |
(attn_1): MemoryEfficientAttnBlock( | |
(norm): GroupNorm(32, 128, eps=1e-06, affine=True) | |
(q): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1)) | |
(k): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1)) | |
(v): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1)) | |
(proj_out): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1)) | |
) | |
(block_2): ResnetBlock( | |
(norm1): GroupNorm(32, 128, eps=1e-06, affine=True) | |
(conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) | |
(norm2): GroupNorm(32, 128, eps=1e-06, affine=True) | |
(dropout): Dropout(p=0.0, inplace=False) | |
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) | |
) | |
) | |
(up): ModuleList( | |
(0): Module( | |
(block): ModuleList( | |
(0): ResnetBlock( | |
(norm1): GroupNorm(32, 64, eps=1e-06, affine=True) | |
(conv1): Conv2d(64, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) | |
(norm2): GroupNorm(32, 32, eps=1e-06, affine=True) | |
(dropout): Dropout(p=0.0, inplace=False) | |
(conv2): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) | |
(nin_shortcut): Conv2d(64, 32, kernel_size=(1, 1), stride=(1, 1)) | |
) | |
(1): ResnetBlock( | |
(norm1): GroupNorm(32, 32, eps=1e-06, affine=True) | |
(conv1): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) | |
(norm2): GroupNorm(32, 32, eps=1e-06, affine=True) | |
(dropout): Dropout(p=0.0, inplace=False) | |
(conv2): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) | |
) | |
) | |
(attn): ModuleList() | |
) | |
(1): Module( | |
(block): ModuleList( | |
(0-1): 2 x ResnetBlock( | |
(norm1): GroupNorm(32, 64, eps=1e-06, affine=True) | |
(conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) | |
(norm2): GroupNorm(32, 64, eps=1e-06, affine=True) | |
(dropout): Dropout(p=0.0, inplace=False) | |
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) | |
) | |
) | |
(attn): ModuleList() | |
(upsample): Upsample( | |
(conv): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) | |
) | |
) | |
(2): Module( | |
(block): ModuleList( | |
(0): ResnetBlock( | |
(norm1): GroupNorm(32, 128, eps=1e-06, affine=True) | |
(conv1): Conv2d(128, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) | |
(norm2): GroupNorm(32, 64, eps=1e-06, affine=True) | |
(dropout): Dropout(p=0.0, inplace=False) | |
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) | |
(nin_shortcut): Conv2d(128, 64, kernel_size=(1, 1), stride=(1, 1)) | |
) | |
(1): ResnetBlock( | |
(norm1): GroupNorm(32, 64, eps=1e-06, affine=True) | |
(conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) | |
(norm2): GroupNorm(32, 64, eps=1e-06, affine=True) | |
(dropout): Dropout(p=0.0, inplace=False) | |
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) | |
) | |
) | |
(attn): ModuleList() | |
(upsample): Upsample( | |
(conv): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) | |
) | |
) | |
(3): Module( | |
(block): ModuleList( | |
(0-1): 2 x ResnetBlock( | |
(norm1): GroupNorm(32, 128, eps=1e-06, affine=True) | |
(conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) | |
(norm2): GroupNorm(32, 128, eps=1e-06, affine=True) | |
(dropout): Dropout(p=0.0, inplace=False) | |
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) | |
) | |
) | |
(attn): ModuleList() | |
(upsample): Upsample( | |
(conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) | |
) | |
) | |
) | |
(norm_out): GroupNorm(32, 32, eps=1e-06, affine=True) | |
(conv_out): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) | |
) | |
) | |
(vit_decoder): DiT2( | |
(blocks): ModuleList( | |
(0-23): 24 x DiTBlock2( | |
(norm1): LayerNorm((1024,), eps=1e-06, elementwise_affine=False) | |
(norm2): LayerNorm((1024,), eps=1e-06, elementwise_affine=False) | |
(attn): MemEffAttention( | |
(qkv): Linear(in_features=1024, out_features=3072, bias=True) | |
(attn_drop): Dropout(p=0.0, inplace=False) | |
(proj): Linear(in_features=1024, out_features=1024, bias=True) | |
(proj_drop): Dropout(p=0.0, inplace=False) | |
(q_norm): Identity() | |
(k_norm): Identity() | |
) | |
(mlp): FusedMLP( | |
(mlp): Sequential( | |
(0): Linear(in_features=1024, out_features=4096, bias=False) | |
(1): FusedDropoutBias( | |
(activation_pytorch): GELU(approximate='none') | |
) | |
(2): Linear(in_features=4096, out_features=1024, bias=False) | |
(3): FusedDropoutBias( | |
(activation_pytorch): Identity() | |
) | |
) | |
) | |
(adaLN_modulation): Sequential( | |
(0): SiLU() | |
(1): Linear(in_features=1024, out_features=6144, bias=True) | |
) | |
) | |
) | |
) | |
(triplane_decoder): Triplane( | |
(renderer): ImportanceRenderer( | |
(ray_marcher): MipRayMarcher2() | |
) | |
(ray_sampler): PatchRaySampler() | |
(decoder): OSGDecoder( | |
(net): Sequential( | |
(0): FullyConnectedLayer(in_features=32, out_features=64, activation=linear) | |
(1): Softplus(beta=1.0, threshold=20.0) | |
(2): FullyConnectedLayer(in_features=64, out_features=4, activation=linear) | |
) | |
) | |
) | |
(decoder_pred): None | |
) | |
) | |
create dataset | |
joint_denoise_rec_model enables AMP to accelerate training | |
mark joint_denoise_rec_model loading | |
loading model from huggingface: yslan/LN3Diff/checkpoints/objaverse/objaverse-dit/i23d/model_joint_denoise_rec_model2990000.safetensors... | |
mark joint_denoise_rec_model loading finished | |