cpu4dream
/

llava-small-OpenELM-AIMv2-0.6B

Image-Text-to-Text

Model card Files Files and versions Metrics Training metrics Community

zcamz commited on about 16 hours ago

Commit

4dfd118

•

1 Parent(s): ad6b3eb

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -28,7 +28,7 @@ model_size: 0.6B parameters
 ---
 ### 🚀 **Model Overview**
-`tiny-llava-open-elm-aimv2` is a lightweight image-text-to-text model that combines **[OpenELM](https://huggingface.co/apple/OpenELM)** as the LLM backbone and **[AIMv2-Large-Patch14-224](https://huggingface.co/apple/aimv2-large-patch14-224)** as the vision encoder. The model has been fine-tuned using **LoRA (Low-Rank Adaptation)** for efficient training. It was developed using the **[TinyLLaVA Factory](https://github.com/TinyLLaVA/TinyLLaVA_Factory)** codebase, which provides a modular framework for lightweight multi-modal models.
 The model is designed to run efficiently on **CPU**, making it ideal for resource-constrained environments. It is trained and evaluated on **POPE** and **TextVQA** benchmarks. The total model size is **0.6B parameters**.

 ---
 ### 🚀 **Model Overview**
+`tiny-llava-open-elm-aimv2` is a lightweight image-text-to-text model that combines **[OpenELM 270M - INSTRUCT](https://huggingface.co/apple/OpenELM-270M-Instruct)** as the LLM backbone and **[AIMv2-Large-Patch14-224-distilled (309M)](https://huggingface.co/apple/aimv2-large-patch14-224-distilled)** as the vision encoder. The model has been fine-tuned using **LoRA (Low-Rank Adaptation)** for efficient training. It was developed using the **[TinyLLaVA Factory](https://github.com/TinyLLaVA/TinyLLaVA_Factory)** codebase, which provides a modular framework for lightweight multi-modal models.
 The model is designed to run efficiently on **CPU**, making it ideal for resource-constrained environments. It is trained and evaluated on **POPE** and **TextVQA** benchmarks. The total model size is **0.6B parameters**.