--- pipeline_tag: image-feature-extraction license: mit tags: - embodied-ai - representation-learning - spatial awareness - spatial intelligence --- # Model Card for SPA: 3D Spatial-Awareness Enables Effective Embodied Representation Pre-trained checkpoints of [SPA](https://haoyizhu.github.io/spa/). SPA is a novel representation learning framework that emphasizes the importance of 3D spatial awareness in embodied AI. It leverages differentiable neural rendering on multi-view images to endow a vanilla Vision Transformer (ViT) with intrinsic spatial understanding. We also present the most comprehensive evaluation of embodied representation learning to date, covering 268 tasks across 8 simulators with diverse policies in both single-task and language-conditioned multi-task scenarios. ## Model Details ### Model Description - **Developed by:** [Haoyi Zhu](https://www.haoyizhu.site/) - **Model type:** Embodied AI Representation Learning - **Encoder (Backbone) type:** Vision Transformer (ViT) ### Model Sources [optional] - **Repository:** [https://github.com/HaoyiZhu/SPA](https://github.com/HaoyiZhu/SPA) - **Paper:** [Hugging Face paper page](https://huggingface.co/papers/2410.08208) - **Project Page:** [https://haoyizhu.github.io/spa/](https://haoyizhu.github.io/spa/) ## Citation ```bib @article{zhu2024spa, title = {SPA: 3D Spatial-Awareness Enables Effective Embodied Representation}, author = {Zhu, Haoyi and and Yang, Honghui and Wang, Yating and Yang, Jiange and Wang, Limin and He, Tong}, journal = {arXiv preprint}, year = {2024}, } ```