---
pipeline_tag: image-feature-extraction
license: mit
tags:
- embodied-ai
- representation-learning
- spatial awareness
- spatial intelligence
---

# Model Card for SPA: 3D Spatial-Awareness Enables Effective Embodied Representation

<!-- Provide a quick summary of what the model is/does. -->

Pre-trained checkpoints of [SPA](https://haoyizhu.github.io/spa/).

SPA is a novel representation learning framework that emphasizes the importance of 3D spatial awareness in embodied AI. 
It leverages differentiable neural rendering on multi-view images to endow a vanilla Vision Transformer (ViT) with 
intrinsic spatial understanding. We also present the most comprehensive evaluation of embodied representation learning to date, 
covering 268 tasks across 8 simulators with diverse policies in both single-task and language-conditioned multi-task scenarios.

## Model Details

### Model Description

<!-- Provide a longer summary of what this model is. -->


- **Developed by:** [Haoyi Zhu](https://www.haoyizhu.site/)
- **Model type:** Embodied AI Representation Learning
- **Encoder (Backbone) type:** Vision Transformer (ViT)

### Model Sources [optional]

<!-- Provide the basic links for the model. -->

- **Repository:** [https://github.com/HaoyiZhu/SPA](https://github.com/HaoyiZhu/SPA)
- **Paper:** [Hugging Face paper page](https://huggingface.co/papers/2410.08208)
- **Project Page:** [https://haoyizhu.github.io/spa/](https://haoyizhu.github.io/spa/)

## Citation
```bib
@article{zhu2024spa,
    title = {SPA: 3D Spatial-Awareness Enables Effective Embodied Representation},
    author = {Zhu, Haoyi and and Yang, Honghui and Wang, Yating and Yang, Jiange and Wang, Limin and He, Tong},
    journal = {arXiv preprint},
    year = {2024},
}
```