tarekziade commited on
Commit
cbd6135
1 Parent(s): a7f7ce0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +60 -94
README.md CHANGED
@@ -1,94 +1,60 @@
1
- ---
2
- tags:
3
- - image-to-text
4
- - image-captioning
5
- license: apache-2.0
6
- metrics:
7
- - rouge
8
- datasets:
9
- - Mozilla/flickr30k-transformed-captions
10
- widget:
11
- - src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/savanna.jpg
12
- example_title: Savanna
13
- - src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/football-match.jpg
14
- example_title: Football Match
15
- - src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/airport.jpg
16
- example_title: Airport
17
- base_model:
18
- - google/vit-base-patch16-224-in21k
19
-
20
- model-index:
21
- - name: mozilla/distilvit
22
- results:
23
- - task:
24
- type: image-to-text
25
- name: Image To Text
26
- dataset:
27
- name: Mozilla/flickr30k-transformed-captions
28
- type: Mozilla/flickr30k-transformed-captions
29
- metrics:
30
- - name: ROUGE-1
31
- type: rouge
32
- value: 43.006
33
- verified: true
34
- - name: ROUGE-2
35
- type: rouge
36
- value: 16.9939
37
- verified: true
38
- - name: ROUGE-L
39
- type: rouge
40
- value: 38.8923
41
- verified: true
42
- - name: ROUGE-LSUM
43
- type: rouge
44
- value: 38.8877
45
- verified: true
46
- - name: loss
47
- type: loss
48
- value: 0.19939416646957397
49
- - name: gen_len
50
- type: gen_len
51
- value: 11.327256736227712
52
- verified: true
53
- ---
54
-
55
- # distilvit
56
-
57
- This model is a work in progress. Fine-tuned version of those base models:
58
-
59
- - a VIT model for the image encoder: https://huggingface.co/google/vit-base-patch16-224-in21k
60
- - a Distilled GPT-2 model for the text decoder: https://huggingface.co/distilbert/distilgpt2
61
-
62
- This model was trained on:
63
-
64
- - [Flickr30k debiased](https://huggingface.co/datasets/Mozilla/flickr30k-transformed-captions-gpt4o)
65
- - [DocOrNot](https://huggingface.co/datasets/Mozilla/docornot)
66
- - [Alt Text Validation](https://huggingface.co/datasets/Mozilla/alt-text-validation)
67
- - A debiased version of COCO 2017: https://cocodataset.org
68
-
69
- You can find the code used to create the model here: https://github.com/mozilla/distilvit
70
-
71
-
72
- # training results
73
-
74
- - eval/gen_len 14.99729
75
- - eval/loss 0.17093
76
- - eval/meteor 0.51479
77
- - eval/rouge1 57.8066
78
- - eval/rouge2 35.0888
79
- - eval/rougeL 52.9138
80
- - eval/rougeLsum 52.9101
81
- - eval/runtime 760.2135
82
- - eval/samples_per_second 11.18
83
- - eval/steps_per_second 0.112
84
- - train/epoch 8.0
85
- - train/global_step 11752
86
- - train/learning_rate 0.0
87
- - train/loss 0.1034
88
- - train/total_flos 1.518634875573869e+20
89
- - train/train_loss 0.14875
90
- - train/train_runtime 91405.9053
91
- - train/train_samples_per_second 12.855
92
- - train/train_steps_per_second 0.129
93
-
94
-
 
1
+ ---
2
+ tags:
3
+ - image-to-text
4
+ - image-captioning
5
+ license: apache-2.0
6
+ metrics:
7
+ - rouge
8
+ datasets:
9
+ - Mozilla/flickr30k-transformed-captions-gpt4o
10
+ widget:
11
+ - src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/savanna.jpg
12
+ example_title: Savanna
13
+ - src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/football-match.jpg
14
+ example_title: Football Match
15
+ - src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/airport.jpg
16
+ example_title: Airport
17
+ base_model:
18
+ - google/vit-base-patch16-224-in21k
19
+ ---
20
+
21
+ # distilvit
22
+
23
+ This model is a work in progress. Fine-tuned version of those base models:
24
+
25
+ - a VIT model for the image encoder: https://huggingface.co/google/vit-base-patch16-224-in21k
26
+ - a Distilled GPT-2 model for the text decoder: https://huggingface.co/distilbert/distilgpt2
27
+
28
+ This model was trained on:
29
+
30
+ - [Flickr30k debiased](https://huggingface.co/datasets/Mozilla/flickr30k-transformed-captions-gpt4o)
31
+ - [DocOrNot](https://huggingface.co/datasets/Mozilla/docornot)
32
+ - [Alt Text Validation](https://huggingface.co/datasets/Mozilla/alt-text-validation)
33
+ - A debiased version of COCO 2017: https://cocodataset.org
34
+
35
+ You can find the code used to create the model here: https://github.com/mozilla/distilvit
36
+
37
+
38
+ # training results
39
+
40
+ - eval/gen_len 14.99729
41
+ - eval/loss 0.17093
42
+ - eval/meteor 0.51479
43
+ - eval/rouge1 57.8066
44
+ - eval/rouge2 35.0888
45
+ - eval/rougeL 52.9138
46
+ - eval/rougeLsum 52.9101
47
+ - eval/runtime 760.2135
48
+ - eval/samples_per_second 11.18
49
+ - eval/steps_per_second 0.112
50
+ - train/epoch 8.0
51
+ - train/global_step 11752
52
+ - train/learning_rate 0.0
53
+ - train/loss 0.1034
54
+ - train/total_flos 1.518634875573869e+20
55
+ - train/train_loss 0.14875
56
+ - train/train_runtime 91405.9053
57
+ - train/train_samples_per_second 12.855
58
+ - train/train_steps_per_second 0.129
59
+
60
+