|
--- |
|
library_name: transformers |
|
tags: |
|
- robotics |
|
- vlm |
|
- image-text-to-text |
|
- multimodal |
|
- pretraining |
|
license: mit |
|
language: |
|
- en |
|
pipeline_tag: image-text-to-text |
|
--- |
|
|
|
# Prism with Qwen 2.5 0.5B backbone (Prismatic-Compatible Version) |
|
|
|
This model is trained on the Llava-1.5-Instruct dataset. |
|
|
|
## Usage Instructions |
|
|
|
See the [MiniVLA GitHub README](https://github.com/Stanford-ILIAD/openvla-mini/blob/main/README.md) for instructions on how to use this checkpoint for downstream training and finetuning. |
|
|
|
## Citation |
|
|
|
**BibTeX:** |
|
|
|
```bibtex |
|
@article{belkhale24minivla, |
|
title={MiniVLA: A Better VLA with a Smaller Footprint}, |
|
author={Suneel Belkhale and Dorsa Sadigh}, |
|
url={https://github.com/Stanford-ILIAD/openvla-mini} |
|
year={2024} |
|
} |
|
``` |