Update README.md
Browse files
README.md
CHANGED
@@ -7,8 +7,45 @@ metrics:
|
|
7 |
pipeline_tag: image-text-to-text
|
8 |
---
|
9 |
|
|
|
|
|
10 |
We use the powerful [TinyLLaVA Factory](https://github.com/TinyLLaVA/TinyLLaVA_Factory) to create a super small image-text-to-text model with only 296M params.
|
11 |
|
12 |
The goal is to make it possible to run LLaVA models on edge devices (with few gigabytes of memory).
|
13 |
|
14 |
-
For LLM and vision tower, we choose [OpenELM-270M-Instruct](apple/OpenELM-270M-Instruct) and [facebook/dinov2-small](facebook/dinov2-small), respectively.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
7 |
pipeline_tag: image-text-to-text
|
8 |
---
|
9 |
|
10 |
+
# Introduction
|
11 |
+
|
12 |
We use the powerful [TinyLLaVA Factory](https://github.com/TinyLLaVA/TinyLLaVA_Factory) to create a super small image-text-to-text model with only 296M params.
|
13 |
|
14 |
The goal is to make it possible to run LLaVA models on edge devices (with few gigabytes of memory).
|
15 |
|
16 |
+
For LLM and vision tower, we choose [OpenELM-270M-Instruct](apple/OpenELM-270M-Instruct) and [facebook/dinov2-small](facebook/dinov2-small), respectively.
|
17 |
+
|
18 |
+
# Result
|
19 |
+
|
20 |
+
For now we measured only [POPE](https://tinyllava-factory.readthedocs.io/en/latest/Evaluation.html#pope) with these results
|
21 |
+
|
22 |
+
Category: adversarial, # samples: 3000
|
23 |
+
TP FP TN FN
|
24 |
+
1264 575 925 236
|
25 |
+
Accuracy: 0.7296666666666667
|
26 |
+
Precision: 0.6873300706905927
|
27 |
+
Recall: 0.8426666666666667
|
28 |
+
F1 score: 0.7571129080563043
|
29 |
+
Yes ratio: 0.613
|
30 |
+
0.757, 0.730, 0.687, 0.843, 0.613
|
31 |
+
====================================
|
32 |
+
Category: popular, # samples: 3000
|
33 |
+
TP FP TN FN
|
34 |
+
1264 301 1199 236
|
35 |
+
Accuracy: 0.821
|
36 |
+
Precision: 0.807667731629393
|
37 |
+
Recall: 0.8426666666666667
|
38 |
+
F1 score: 0.8247960848287113
|
39 |
+
Yes ratio: 0.5216666666666666
|
40 |
+
0.825, 0.821, 0.808, 0.843, 0.522
|
41 |
+
====================================
|
42 |
+
Category: random, # samples: 2910
|
43 |
+
TP FP TN FN
|
44 |
+
1264 290 1120 236
|
45 |
+
Accuracy: 0.8192439862542955
|
46 |
+
Precision: 0.8133848133848134
|
47 |
+
Recall: 0.8426666666666667
|
48 |
+
F1 score: 0.8277668631303209
|
49 |
+
Yes ratio: 0.534020618556701
|
50 |
+
0.828, 0.819, 0.813, 0.843, 0.534
|
51 |
+
====================================
|