|
--- |
|
tags: |
|
- image-to-text |
|
- image-captioning |
|
license: apache-2.0 |
|
metrics: |
|
- rouge |
|
datasets: |
|
- nlphuji/flickr30k |
|
widget: |
|
- src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/savanna.jpg |
|
example_title: Savanna |
|
- src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/football-match.jpg |
|
example_title: Football Match |
|
- src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/airport.jpg |
|
example_title: Airport |
|
base_model: |
|
- google/vit-base-patch16-224-in21k |
|
|
|
model-index: |
|
- name: mozilla/distilvit |
|
results: |
|
- task: |
|
type: image-to-text |
|
name: Image To Text |
|
dataset: |
|
name: Mozilla/flickr30k-transformed-captions |
|
type: Mozilla/flickr30k-transformed-captions |
|
metrics: |
|
- name: ROUGE-1 |
|
type: rouge |
|
value: 43.006 |
|
verified: true |
|
- name: ROUGE-2 |
|
type: rouge |
|
value: 16.9939 |
|
verified: true |
|
- name: ROUGE-L |
|
type: rouge |
|
value: 38.8923 |
|
verified: true |
|
- name: ROUGE-LSUM |
|
type: rouge |
|
value: 38.8877 |
|
verified: true |
|
- name: loss |
|
type: loss |
|
value: 0.19939416646957397 |
|
- name: gen_len |
|
type: gen_len |
|
value: 11.327256736227712 |
|
verified: true |
|
--- |
|
|
|
# distilvit |
|
|
|
This model is a work in progress. Fine-tuned version of those base models: |
|
|
|
- a VIT model for the image encoder: https://huggingface.co/google/vit-base-patch16-224-in21k |
|
- a Distilled GPT-2 model for the text decoder: https://huggingface.co/distilbert/distilgpt2 |
|
|
|
This model was trained on: |
|
|
|
- Flickr30k : https://huggingface.co/datasets/nlphuji/flickr30k |
|
- COCO 2017: https://cocodataset.org |
|
|
|
You can get that checkpoint using the 3083a3cef6e3c8dd90df3f088074bbe836b0f403 commit. |
|
|
|
It was then further fine-tuned on : |
|
|
|
- [Flickr30k debiased](https://huggingface.co/datasets/Mozilla/flickr30k-transformed-captions) |
|
- [DocOrNot](https://huggingface.co/datasets/Mozilla/docornot) |
|
- [Alt Text Validation](https://huggingface.co/datasets/Mozilla/alt-text-validation) |
|
|
|
For the latter, the dataset was annotated by our team to correct the alt text generated by the model, |
|
using the [checkvite tool](https://github.com/mozila/checkvite). |
|
|
|
You can find the code used to create the model here: https://github.com/mozilla/distilvit |