MobileCLIP2: Improving Multi-Modal Reinforced Training
Paper
•
2508.20691
•
Published
•
5
MobileCLIP2: Mobile-friendly image-text models with SOTA zero-shot capabilities trained on DFNDR-2B
Note Timm ViT-L/14 architecture trained on DFNDR-2B (dataset of MobileCLIP2)
Note 👇MobileCLIP2 architectures pretrained on DataCompDR (dataset of MobileCLIP v1)
Note Timm ViT-L/14 architecture pretrained on DataCompDR (dataset of MobileCLIP v1)
Note 👇Timm checkpoints
Note 👇MobileCLIP2 CoCa models for synthetic caption generation used to train MobileCLIP2 models
Note 👇MobileCLIP2 CoCa models (context length=256). Higher chance of generating repeated output.
Note MobileCLIP2 CoCa base model. It can be used for fine-tuning new CoCa models on high quality datasets.