sbrzz
/

TinyLLaVA-OpenELM-270M-Instruct-Dinov2-small

Image-Text-to-Text

Model card Files Files and versions Community

TinyLLaVA-OpenELM-270M-Instruct-Dinov2-small / README.md

sbrzz's picture

Update README.md

77359f5 verified about 2 months ago

|

No virus

1.45 kB

	---
	license: apache-2.0
	language:
	- en
	metrics:
	- accuracy
	pipeline_tag: image-text-to-text
	---

	# Introduction

	We use the powerful [TinyLLaVA Factory](https://github.com/TinyLLaVA/TinyLLaVA_Factory) to create a super small image-text-to-text model with only 296M params.

	The goal is to make it possible to run LLaVA models on edge devices (with few gigabytes of memory).

	For LLM and vision tower, we choose [OpenELM-270M-Instruct](apple/OpenELM-270M-Instruct) and [facebook/dinov2-small](facebook/dinov2-small), respectively.

	# Result

	[POPE](https://tinyllava-factory.readthedocs.io/en/latest/Evaluation.html#pope):

	\| Category \| # Samples \| TP \| FP \| TN \| FN \| Accuracy \| Precision \| Recall \| F1 Score \| Yes Ratio \|
	\|-------------\|------------\|------\|-----\|------\|-----\|----------\|-----------\|--------\|----------\|-----------\|
	\| Adversarial \| 3000 \| 1264 \| 575 \| 925 \| 236 \| 0.7297 \| 0.6873 \| 0.8427 \| 0.7571 \| 0.613 \|
	\| Popular \| 3000 \| 1264 \| 301 \| 1199 \| 236 \| 0.8210 \| 0.8077 \| 0.8427 \| 0.8248 \| 0.5217 \|
	\| Random \| 2910 \| 1264 \| 290 \| 1120 \| 236 \| 0.8192 \| 0.8134 \| 0.8427 \| 0.8278 \| 0.5340 \|

	[TEXTVQA](https://tinyllava-factory.readthedocs.io/en/latest/Evaluation.html#textvqa)

	Samples 5000, Accuracy 27%

	[SCIENCEQA](https://tinyllava-factory.readthedocs.io/en/latest/Evaluation.html#scienceqa)

	Samples 4241, Correct: 1725, Accuracy: 40.64%, IMG-Accuracy: 36.54%