--- tags: - image-to-text - image-captioning license: apache-2.0 metrics: - rouge datasets: - nlphuji/flickr30k widget: - src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/savanna.jpg example_title: Savanna - src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/football-match.jpg example_title: Football Match - src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/airport.jpg example_title: Airport base_model: - google/vit-base-patch16-224-in21k model-index: - name: mozilla/distilvit results: - task: type: image-to-text name: Image To Text dataset: name: Mozilla/flickr30k-transformed-captions type: Mozilla/flickr30k-transformed-captions metrics: - name: ROUGE-1 type: rouge value: 43.006 verified: true - name: ROUGE-2 type: rouge value: 16.9939 verified: true - name: ROUGE-L type: rouge value: 38.8923 verified: true - name: ROUGE-LSUM type: rouge value: 38.8877 verified: true - name: loss type: loss value: 0.19939416646957397 - name: gen_len type: gen_len value: 11.327256736227712 verified: true --- # distilvit This model is a work in progress. Fine-tuned version of those base models: - a VIT model for the image encoder: https://huggingface.co/google/vit-base-patch16-224-in21k - a Distilled GPT-2 model for the text decoder: https://huggingface.co/distilbert/distilgpt2 This model was trained on: - Flickr30k : https://huggingface.co/datasets/nlphuji/flickr30k - COCO 2017: https://cocodataset.org You can get that checkpoint using the 3083a3cef6e3c8dd90df3f088074bbe836b0f403 commit. It was then further fine-tuned on : - [Flickr30k debiased](https://huggingface.co/datasets/Mozilla/flickr30k-transformed-captions) - [DocOrNot](https://huggingface.co/datasets/Mozilla/docornot) - [Alt Text Validation](https://huggingface.co/datasets/Mozilla/alt-text-validation) For the latter, the dataset was annotated by our team to correct the alt text generated by the model, using the [checkvite tool](https://github.com/mozila/checkvite). You can find the code used to create the model here: https://github.com/mozilla/distilvit