license: apache-2.0 | |
language: | |
- en | |
metrics: | |
- accuracy | |
pipeline_tag: image-text-to-text | |
We use the powerfull [TinyLLaVA Factory](https://github.com/TinyLLaVA/TinyLLaVA_Factory) to create a super small image-text-to-text model with only 296M params. | |
The goal is to make it possible to run LLaVA models on edge devices (with few gigabytes of memory). | |
For LLM and vision tower, we choose [OpenELM-270M-Instruct](apple/OpenELM-270M-Instruct) and [facebook/dinov2-small](facebook/dinov2-small), respectively. |