|
--- |
|
license: apache-2.0 |
|
language: |
|
- en |
|
metrics: |
|
- accuracy |
|
pipeline_tag: image-text-to-text |
|
base_model: |
|
- apple/OpenELM-270M-Instruct |
|
- facebook/dinov2-small |
|
--- |
|
|
|
# Introduction |
|
|
|
We use the powerful [TinyLLaVA Factory](https://github.com/TinyLLaVA/TinyLLaVA_Factory) to create a super small image-text-to-text model with only 296M params. |
|
|
|
The goal is to make it possible to run LLaVA models on edge devices (with few gigabytes of memory). |
|
|
|
For LLM and vision tower, we choose [OpenELM-270M-Instruct](apple/OpenELM-270M-Instruct) and [facebook/dinov2-small](facebook/dinov2-small), respectively. |
|
|
|
# Result |
|
|
|
[POPE](https://tinyllava-factory.readthedocs.io/en/latest/Evaluation.html#pope): |
|
|
|
| Category | # Samples | TP | FP | TN | FN | Accuracy | Precision | Recall | F1 Score | Yes Ratio | |
|
|-------------|------------|------|-----|------|-----|----------|-----------|--------|----------|-----------| |
|
| Adversarial | 3000 | 1264 | 575 | 925 | 236 | 0.7297 | 0.6873 | 0.8427 | 0.7571 | 0.613 | |
|
| Popular | 3000 | 1264 | 301 | 1199 | 236 | 0.8210 | 0.8077 | 0.8427 | 0.8248 | 0.5217 | |
|
| Random | 2910 | 1264 | 290 | 1120 | 236 | 0.8192 | 0.8134 | 0.8427 | 0.8278 | 0.5340 | |
|
|
|
[TEXTVQA](https://tinyllava-factory.readthedocs.io/en/latest/Evaluation.html#textvqa) |
|
|
|
Samples 5000, Accuracy 27% |
|
|
|
[SCIENCEQA](https://tinyllava-factory.readthedocs.io/en/latest/Evaluation.html#scienceqa) |
|
|
|
Samples 4241, Correct: 1725, Accuracy: 40.64%, IMG-Accuracy: 36.54% |
|
|
|
[MMMU](https://tinyllava-factory.readthedocs.io/en/latest/Evaluation.html#mmmu) |
|
|
|
| Category | # Samples | Accuracy | |
|
|---------------------------------|-----------|----------| |
|
| Overall | 900 | 0.273 | |
|
| Overall-Art and Design | 120 | 0.233 | |
|
| Art | 30 | 0.233 | |
|
| Art Theory | 30 | 0.167 | |
|
| Design | 30 | 0.367 | |
|
| Music | 30 | 0.167 | |
|
| Overall-Business | 150 | 0.293 | |
|
| Accounting | 30 | 0.367 | |
|
| Economics | 30 | 0.467 | |
|
| Finance | 30 | 0.200 | |
|
| Management | 30 | 0.233 | |
|
| Marketing | 30 | 0.200 | |
|
| Overall-Science | 150 | 0.273 | |
|
| Biology | 30 | 0.267 | |
|
| Chemistry | 30 | 0.100 | |
|
| Geography | 30 | 0.200 | |
|
| Math | 30 | 0.433 | |
|
| Physics | 30 | 0.367 | |
|
| Overall-Health and Medicine | 150 | 0.293 | |
|
| Basic Medical Science | 30 | 0.333 | |
|
| Clinical Medicine | 30 | 0.200 | |
|
| Diagnostics and Laboratory Med. | 30 | 0.233 | |
|
| Pharmacy | 30 | 0.333 | |
|
| Public Health | 30 | 0.367 | |
|
| Overall-Humanities and Soc. Sci.| 120 | 0.267 | |
|
| History | 30 | 0.333 | |
|
| Literature | 30 | 0.300 | |
|
| Sociology | 30 | 0.133 | |
|
| Psychology | 30 | 0.300 | |
|
| Overall-Tech and Engineering | 210 | 0.271 | |
|
| Agriculture | 30 | 0.200 | |
|
| Architecture and Engineering | 30 | 0.267 | |
|
| Computer Science | 30 | 0.333 | |
|
| Electronics | 30 | 0.267 | |
|
| Energy and Power | 30 | 0.333 | |
|
| Materials | 30 | 0.267 | |
|
| Mechanical Engineering | 30 | 0.233 | |