rwightman HF staff commited on
Commit
cc6d637
1 Parent(s): f0bb3f6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +92 -1
README.md CHANGED
@@ -1,8 +1,99 @@
1
  ---
2
  tags:
3
  - clip
 
4
  library_name: open_clip
5
  pipeline_tag: zero-shot-image-classification
6
- license: mit
 
 
7
  ---
8
  # Model card for ViT-SO400M-16-SigLIP-i18n-256
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  tags:
3
  - clip
4
+ - siglip
5
  library_name: open_clip
6
  pipeline_tag: zero-shot-image-classification
7
+ license: apache-2.0
8
+ datasets:
9
+ - webli
10
  ---
11
  # Model card for ViT-SO400M-16-SigLIP-i18n-256
12
+
13
+ A SigLIP (Sigmoid loss for Language-Image Pre-training) model trained on WebLI in multiple languages (i18n variant) w/ a multi-lingual tokenizer.
14
+
15
+ This model has been converted to PyTorch from the original JAX checkpoints in [Big Vision](https://github.com/google-research/big_vision). These weights are usable in both OpenCLIP (image + text) and timm (image only).
16
+
17
+ ## Model Details
18
+ - **Model Type:** Contrastive Image-Text, Zero-Shot Image Classification.
19
+ - **Original:** https://github.com/google-research/big_vision
20
+ - **Dataset:** WebLI
21
+ - **Papers:**
22
+ - Sigmoid loss for language image pre-training: https://arxiv.org/abs/2303.15343
23
+
24
+ ## Model Usage
25
+ ### With OpenCLIP
26
+ ```python
27
+ import torch
28
+ import torch.nn.functional as F
29
+ from urllib.request import urlopen
30
+ from PIL import Image
31
+ from open_clip import create_model_from_pretrained, get_tokenizer # works on open-clip-torch>=2.27, timm>=1.0.10
32
+
33
+ model, preprocess = create_model_from_pretrained('hf-hub:timm/ViT-SO400M-16-SigLIP-i18n-256')
34
+ tokenizer = get_tokenizer('hf-hub:timm/ViT-SO400M-16-SigLIP-i18n-256')
35
+
36
+ image = Image.open(urlopen(
37
+ 'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
38
+ ))
39
+ image = preprocess(image).unsqueeze(0)
40
+
41
+ labels_list = ["a dog", "a cat", "a donut", "a beignet"]
42
+ text = tokenizer(labels_list, context_length=model.context_length)
43
+
44
+ with torch.no_grad(), torch.cuda.amp.autocast():
45
+ image_features = model.encode_image(image)
46
+ text_features = model.encode_text(text)
47
+ image_features = F.normalize(image_features, dim=-1)
48
+ text_features = F.normalize(text_features, dim=-1)
49
+
50
+ text_probs = torch.sigmoid(image_features @ text_features.T * model.logit_scale.exp() + model.logit_bias)
51
+
52
+ zipped_list = list(zip(labels_list, [round(p.item(), 3) for p in text_probs[0]]))
53
+ print("Label probabilities: ", zipped_list)
54
+ ```
55
+
56
+ ### With `timm` (for image embeddings)
57
+ ```python
58
+ from urllib.request import urlopen
59
+ from PIL import Image
60
+ import timm
61
+
62
+ image = Image.open(urlopen(
63
+ 'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
64
+ ))
65
+
66
+ model = timm.create_model(
67
+ 'vit_so400m_patch14_siglip_256.webli_i18n',
68
+ pretrained=True,
69
+ num_classes=0,
70
+ )
71
+ model = model.eval()
72
+
73
+ # get model specific transforms (normalization, resize)
74
+ data_config = timm.data.resolve_model_data_config(model)
75
+ transforms = timm.data.create_transform(**data_config, is_training=False)
76
+
77
+ output = model(transforms(image).unsqueeze(0)) # output is (batch_size, num_features) shaped tensor
78
+ ```
79
+
80
+ ## Citation
81
+ ```bibtex
82
+ @article{zhai2023sigmoid,
83
+ title={Sigmoid loss for language image pre-training},
84
+ author={Zhai, Xiaohua and Mustafa, Basil and Kolesnikov, Alexander and Beyer, Lucas},
85
+ journal={arXiv preprint arXiv:2303.15343},
86
+ year={2023}
87
+ }
88
+ ```
89
+ ```bibtex
90
+ @misc{big_vision,
91
+ author = {Beyer, Lucas and Zhai, Xiaohua and Kolesnikov, Alexander},
92
+ title = {Big Vision},
93
+ year = {2022},
94
+ publisher = {GitHub},
95
+ journal = {GitHub repository},
96
+ howpublished = {\url{https://github.com/google-research/big_vision}}
97
+ }
98
+ ```
99
+