sapiens
English

Normal-Sapiens-2B

Model Details

Sapiens is a family of vision transformers pretrained on 300 million human images at 1024 x 1024 image resolution. The pretrained models, when finetuned for human-centric vision tasks, generalize to in-the-wild conditions. Sapiens-2B natively support 1K high-resolution inference. The resulting models exhibit remarkable generalization to in-the-wild data, even when labeled data is scarce or entirely synthetic.

  • Developed by: Meta
  • Model type: Vision Transformer
  • License: Creative Commons Attribution-NonCommercial 4.0
  • Task: normal
  • Format: original
  • File: sapiens_2b_normal_render_people_epoch_70.pth

Model Card

  • Image Size: 1024 x 768 (H x W)
  • Num Parameters: 2.163 B
  • FLOPs: 8.709 TFLOPs
  • Patch Size: 16 x 16
  • Embedding Dimensions: 1920
  • Num Layers: 48
  • Num Heads: 32
  • Feedforward Channels: 7680

More Resources

Uses

Normal 2B model can be used to estimate surface normal (XYZ) on human images.

Downloads last month
28
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model’s pipeline type.

Collection including facebook/sapiens-normal-2b