Update README.md
Browse filesWeights (align and finetune stag) for DinoV2-SigLIP-Phi3(LoRA) model. The model details are as follows,
* **Vision Encoder** - DinoV2 + SigLIP @384px resolution.
* **Connector** - MLP (Dino and SigLIP features are concatenated and then projected to Phi3 representation space)
* **Language Model** - Phi3 + LoRA
* **Pre-train (Align) Dataset** - LLaVA-CC3M-Pretrain-595K
* **Fine-tune (Instruction) Dataset** - LLAVA-v1.5-Instruct + LRV-Instruct
Scripts to build and train the model is available at [DinoV2-SigLIP-Phi3-LoRA-VLM](https://github.com/NMS05/DinoV2-SigLIP-Phi3-LoRA-VLM).
README.md
CHANGED
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# DinoV2-SigLIP-Phi3(LoRA)
|
2 |
+
|
3 |
+
## Model and Dataset Details
|
4 |
+
|
5 |
+
* **Vision Encoder** - DinoV2 + SigLIP @384px resolution.
|
6 |
+
* **Connector** - MLP (Dino and SigLIP features are concatenated and then projected to Phi3 representation space)
|
7 |
+
* **Language Model** - Phi3 + LoRA
|
8 |
+
* **Pre-train (Align) Dataset** - LLaVA-CC3M-Pretrain-595K
|
9 |
+
* **Fine-tune (Instruction) Dataset** - LLAVA-v1.5-Instruct + LRV-Instruct
|