SPA / README.md
HaoyiZhu's picture
Initial commit
7026c7f verified
|
raw
history blame
1.67 kB
metadata
license: mit
tags:
  - embodied-ai
  - representation-learning
  - spatial awareness
  - spatial intelligence

Model Card for SPA: 3D Spatial-Awareness Enables Effective Embodied Representation

Pre-trained checkpoints of SPA.

SPA is a novel representation learning framework that emphasizes the importance of 3D spatial awareness in embodied AI. It leverages differentiable neural rendering on multi-view images to endow a vanilla Vision Transformer (ViT) with intrinsic spatial understanding. We also present the most comprehensive evaluation of embodied representation learning to date, covering 268 tasks across 8 simulators with diverse policies in both single-task and language-conditioned multi-task scenarios.

Model Details

Model Description

  • Developed by: Haoyi Zhu
  • Model type: Embodied AI Representation Learning
  • Encoder (Backbone) type: Vision Transformer (ViT)

Model Sources [optional]

Citation

@article{zhu2024spa,
    title = {SPA: 3D Spatial-Awareness Enables Effective Embodied Representation},
    author = {Zhu, Haoyi and and Yang, Honghui and Wang, Yating and Yang, Jiange and Wang, Limin and He, Tong},
    journal = {arXiv preprint},
    year = {2024},
}