--- library_name: transformers pipeline_tag: zero-shot-image-classification license: cc-by-nc-4.0 tags: - clip - multilingual --- # Model Card for Distilled MetaCLIP 2 ViT-B/32 (mT5 Tokenizer) (worldwide) Distilled MetaCLIP 2 (worldwide) was presented in [MetaCLIP 2: A Worldwide Scaling Recipe](https://huggingface.co/papers/2507.22062). This checkpoint corresponds to "ViT-B-32-mT5-worldwide" of the [original implementation](https://github.com/facebookresearch/MetaCLIP). ## Install First install the Transformers library (from source for now): ```bash pip install -q git+https://github.com/huggingface/transformers.git ``` ## Usage Next you can use it like so: ```python import torch from transformers import pipeline clip = pipeline( task="zero-shot-image-classification", model="facebook/metaclip-2-mt5-worldwide-b32", torch_dtype=torch.bfloat16, device=0 ) labels = ["a photo of a cat", "a photo of a dog", "a photo of a car"] results = clip("http://images.cocodataset.org/val2017/000000039769.jpg", candidate_labels=labels) print(results) ``` In case you want to perform pre- and postprocessing yourself, you can use the `AutoModel` API: ```python import requests import torch from PIL import Image from transformers import AutoProcessor, AutoModel # note: make sure to verify that `AutoModel` is an instance of `MetaClip2Model` model = AutoModel.from_pretrained("facebook/metaclip-2-mt5-worldwide-b32", torch_dtype=torch.bfloat16, attn_implementation="sdpa") processor = AutoProcessor.from_pretrained("facebook/metaclip-2-mt5-worldwide-b32") url = "http://images.cocodataset.org/val2017/000000039769.jpg" image = Image.open(requests.get(url, stream=True).raw) labels = ["a photo of a cat", "a photo of a dog", "a photo of a car"] inputs = processor(text=labels, images=image, return_tensors="pt", padding=True) outputs = model(**inputs) logits_per_image = outputs.logits_per_image probs = logits_per_image.softmax(dim=1) most_likely_idx = probs.argmax(dim=1).item() most_likely_label = labels[most_likely_idx] print(f"Most likely label: {most_likely_label} with probability: {probs[0][most_likely_idx].item():.3f}") ```