File size: 1,757 Bytes
7026c7f 0ccd183 7026c7f cbcfd94 7026c7f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 |
---
pipeline_tag: image-feature-extraction
license: mit
tags:
- embodied-ai
- representation-learning
- spatial awareness
- spatial intelligence
---
# Model Card for SPA: 3D Spatial-Awareness Enables Effective Embodied Representation
<!-- Provide a quick summary of what the model is/does. -->
Pre-trained checkpoints of [SPA](https://haoyizhu.github.io/spa/).
SPA is a novel representation learning framework that emphasizes the importance of 3D spatial awareness in embodied AI.
It leverages differentiable neural rendering on multi-view images to endow a vanilla Vision Transformer (ViT) with
intrinsic spatial understanding. We also present the most comprehensive evaluation of embodied representation learning to date,
covering 268 tasks across 8 simulators with diverse policies in both single-task and language-conditioned multi-task scenarios.
## Model Details
### Model Description
<!-- Provide a longer summary of what this model is. -->
- **Developed by:** [Haoyi Zhu](https://www.haoyizhu.site/)
- **Model type:** Embodied AI Representation Learning
- **Encoder (Backbone) type:** Vision Transformer (ViT)
### Model Sources [optional]
<!-- Provide the basic links for the model. -->
- **Repository:** [https://github.com/HaoyiZhu/SPA](https://github.com/HaoyiZhu/SPA)
- **Paper:** [Hugging Face paper page](https://huggingface.co/papers/2410.08208)
- **Project Page:** [https://haoyizhu.github.io/spa/](https://haoyizhu.github.io/spa/)
## Citation
```bib
@article{zhu2024spa,
title = {SPA: 3D Spatial-Awareness Enables Effective Embodied Representation},
author = {Zhu, Haoyi and and Yang, Honghui and Wang, Yating and Yang, Jiange and Wang, Limin and He, Tong},
journal = {arXiv preprint},
year = {2024},
}
```
|