distilvit
This model is a work in progress. Fine-tuned version of those base models:
- a VIT model for the image encoder: https://huggingface.co/google/vit-base-patch16-224-in21k
- a Distilled GPT-2 model for the text decoder: https://huggingface.co/distilbert/distilgpt2
This model was trained on:
- Flickr30k : https://huggingface.co/datasets/nlphuji/flickr30k
- COCO 2017: https://cocodataset.org
You can get that checkpoint using the 3083a3cef6e3c8dd90df3f088074bbe836b0f403 commit.
It was then further fine-tuned on :
For the latter, the dataset was annotated by our team to correct the alt text generated by the model, using the checkvite tool.
You can find the code used to create the model here: https://github.com/mozilla/distilvit
- Downloads last month
- 8
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for tarekziade/vit-base-patch16-224-distilgpt2
Base model
google/vit-base-patch16-224-in21kDataset used to train tarekziade/vit-base-patch16-224-distilgpt2
Evaluation results
- ROUGE-1 on Mozilla/flickr30k-transformed-captionsself-reported43.006
- ROUGE-2 on Mozilla/flickr30k-transformed-captionsself-reported16.994
- ROUGE-L on Mozilla/flickr30k-transformed-captionsself-reported38.892
- ROUGE-LSUM on Mozilla/flickr30k-transformed-captionsself-reported38.888
- loss on Mozilla/flickr30k-transformed-captionsself-reported0.199
- gen_len on Mozilla/flickr30k-transformed-captionsself-reported11.327