openvla
/

openvla-7b

@@ -14,8 +14,7 @@ pipeline_tag: image-text-to-text
 OpenVLA 7B (`openvla-7b`) is an open vision-language-action model trained on 970K robot manipulation episodes from the [Open X-Embodiment](https://robotics-transformer-x.github.io/) dataset.
 The model takes language instructions and camera images as input and generates robot actions. It supports controlling multiple robots out-of-the-box, and can be quickly adapted for new robot domains via (parameter-efficient) fine-tuning.
-All OpenVLA checkpoints are released under an MIT License. We additionally release our [pretraining and fine-tuning codebase](https://github.com/openvla/openvla) under
-the same license.
 For full details of our model and pretraining procedure please read [our paper](https://openvla.github.io/) and see [our project page](https://openvla.github.io/).
@@ -41,7 +40,7 @@ per-dataset basis. See [our repository](https://github.com/openvla/openvla) for
 OpenVLA models can be used zero-shot to control robots for specific combinations of embodiments and domains seen in the Open-X pretraining mixture (e.g., for
 [BridgeV2 environments with a Widow-X robot](https://rail-berkeley.github.io/bridgedata/)). They can also be efficiently *fine-tuned* for new tasks and robot setups
-given minimal demonstration data; [we provide example scripts for full and parameter-efficient finetuning](https://github.com/openvla/openvla/blob/main/scripts/finetune.py).
 **Out-of-Scope:** OpenVLA models do not zero-shot generalize to new (unseen) robot embodiments, or setups that are not represented in the pretraining mix; in these cases,
 we suggest collecting a dataset of demonstrations on the desired setup, and fine-tuning OpenVLA models instead.

 OpenVLA 7B (`openvla-7b`) is an open vision-language-action model trained on 970K robot manipulation episodes from the [Open X-Embodiment](https://robotics-transformer-x.github.io/) dataset.
 The model takes language instructions and camera images as input and generates robot actions. It supports controlling multiple robots out-of-the-box, and can be quickly adapted for new robot domains via (parameter-efficient) fine-tuning.
+All OpenVLA checkpoints, as well as our [training codebase](https://github.com/openvla/openvla) are released under an MIT License.
 For full details of our model and pretraining procedure please read [our paper](https://openvla.github.io/) and see [our project page](https://openvla.github.io/).
 OpenVLA models can be used zero-shot to control robots for specific combinations of embodiments and domains seen in the Open-X pretraining mixture (e.g., for
 [BridgeV2 environments with a Widow-X robot](https://rail-berkeley.github.io/bridgedata/)). They can also be efficiently *fine-tuned* for new tasks and robot setups
+given minimal demonstration data; [see here](https://github.com/openvla/openvla/blob/main/scripts/finetune.py).
 **Out-of-Scope:** OpenVLA models do not zero-shot generalize to new (unseen) robot embodiments, or setups that are not represented in the pretraining mix; in these cases,
 we suggest collecting a dataset of demonstrations on the desired setup, and fine-tuning OpenVLA models instead.