cpu4dream
/

llava-small-OpenELM-AIMv2-0.6B

Image-Text-to-Text

Model card Files Files and versions Metrics Training metrics Community

llava-small-OpenELM-AIMv2-0.6B / README.md

Eithannak's picture

Update README.md

26a8c29 verified 1 day ago

|

2.83 kB

	---
	license: mit
	datasets:
	- liuhaotian/LLaVA-Pretrain
	- liuhaotian/LLaVA-Instruct-150K
	language:
	- en
	metrics:
	- accuracy
	- precision
	- recall
	- f1
	base_model:
	- apple/aimv2-large-patch14-224
	- apple/OpenELM
	pipeline_tag: image-text-to-text
	tags:
	- cpu
	- nano
	- small
	- tiny
	- llava
	model_size: 0.6B parameters
	---

	<center><span style="font-size:2em;">Tiny Llava 4 CPU 🐛</span></center>

	[![License](https://img.shields.io/badge/License-MIT-brightgreen.svg)](https://opensource.org/licenses/MIT)
	[![CPU](https://img.shields.io/badge/CPU-Supported-blue)](https://huggingface.co)
	[![arXiv](https://img.shields.io/badge/arXiv-2402.14289-red)](https://arxiv.org/pdf/2402.14289)

	---

	### 🚀 Model Overview
	`tiny-llava-open-elm-aimv2` is a lightweight image-text-to-text model that combines [OpenELM](https://huggingface.co/apple/OpenELM) as the LLM backbone and [AIMv2-Large-Patch14-224](https://huggingface.co/apple/aimv2-large-patch14-224) as the vision encoder. The model has been fine-tuned using LoRA (Low-Rank Adaptation) for efficient training. It was developed using the [TinyLLaVA Factory](https://github.com/TinyLLaVA/TinyLLaVA_Factory) codebase, which provides a modular framework for lightweight multi-modal models.

	The model is designed to run efficiently on CPU, making it ideal for resource-constrained environments. It is trained and evaluated on POPE and TextVQA benchmarks. The total model size is 0.6B parameters.

	---

	### 📊 Performance

	\| Model Name \| VQAv2 \| GQA \| SQA \| TextVQA \| MM-VET \| POPE \| MME \| MMMU \|
	\|:-----------------------------------------------------------:\|:-----:\|:-----:\|:-----:\|:-------:\|:------:\|:-----:\|:------:\|:-----:\|
	\| [LLaVA-1.5-7B](https://huggingface.co/llava-hf/llava-1.5-7b-hf) \| 78.5 \| 62.0 \| 66.8 \| 58.2 \| 30.5 \| 85.9 \| 1510.7 \| - \|
	\| [bczhou/TinyLLaVA-3.1B](https://huggingface.co/bczhou/TinyLLaVA-3.1B) \| 79.9 \| 62.0 \| 69.1 \| 59.1 \| 32.0 \| 86.4 \| 1464.9 \| - \|
	\| [tinyllava/TinyLLaVA-Gemma-SigLIP-2.4B](https://huggingface.co/tinyllava/TinyLLaVA-Gemma-SigLIP-2.4B) \| 78.4 \| 61.6 \| 64.4 \| 53.6 \| 26.9 \| 86.4 \| 1339.0 \| 31.7 \|
	\| [tinyllava/TinyLLaVA-Phi-2-SigLIP-3.1B](https://huggingface.co/tinyllava/TinyLLaVA-Phi-2-SigLIP-3.1B) \| 80.1 \| 62.1 \| 73.0 \| 60.3 \| 37.5 \| 87.2 \| 1466.4 \| 38.4 \|
	\| tiny-llava-open-elm-aimv2 \| - \| - \| - \| 39.68 \| - \| 83.93 \| - \| - \|

	---

	### 🔗 References
	- [OpenELM](https://huggingface.co/apple/OpenELM)
	- [AIMv2-Large-Patch14-224](https://huggingface.co/apple/aimv2-large-patch14-224)
	- [TinyLLaVA Factory](https://github.com/TinyLLaVA/TinyLLaVA_Factory)
	- [LoRA Paper (arXiv:2402.14289)](https://arxiv.org/pdf/2402.14289)