distilvit
This model is a work in progress. Fine-tuned version of those base models:
- a VIT model for the image encoder: https://huggingface.co/google/vit-base-patch16-224-in21k
- a Distilled GPT-2 model for the text decoder: https://huggingface.co/distilbert/distilgpt2
This model was trained on:
- Flickr30k : https://huggingface.co/datasets/nlphuji/flickr30k
- COCO 2017: https://cocodataset.org
You can get that checkpoint using the 3083a3cef6e3c8dd90df3f088074bbe836b0f403 commit.
It was then further fine-tuned on :
- Flickr30k debiased: https://huggingface.co/datasets/Mozilla/flickr30k-transformed-captions
- DocOrNot: https://huggingface.co/datasets/Mozilla/docornot
You can find the code used to create the model here: https://github.com/mozilla/distilvit
Framework versions
- Transformers 4.40.2
- Pytorch 2.3.0+cu121
- Datasets 2.19.1
- Tokenizers 0.19.1
- Downloads last month
- 4
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for tarekziade/test-push
Base model
google/vit-base-patch16-224-in21kDataset used to train tarekziade/test-push
Evaluation results
- ROUGE-1 on nlphuji/flickr30kself-reported43.006
- ROUGE-2 on nlphuji/flickr30kself-reported16.994
- ROUGE-L on nlphuji/flickr30kself-reported38.892
- ROUGE-LSUM on nlphuji/flickr30kself-reported38.888
- loss on nlphuji/flickr30kself-reported0.199
- gen_len on nlphuji/flickr30kself-reported11.327