timm
/

Image Feature Extraction
timm
PyTorch
Safetensors
rwightman HF staff commited on
Commit
bbc642f
1 Parent(s): 03012ea

Update model config and README

Browse files
Files changed (2) hide show
  1. README.md +109 -2
  2. model.safetensors +3 -0
README.md CHANGED
@@ -2,6 +2,113 @@
2
  tags:
3
  - image-classification
4
  - timm
5
- library_tag: timm
 
 
 
6
  ---
7
- # Model card for vit_small_patch8_224.dino
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  tags:
3
  - image-classification
4
  - timm
5
+ library_name: timm
6
+ license: apache-2.0
7
+ datasets:
8
+ - imagenet-1k
9
  ---
10
+ # Model card for vit_small_patch8_224.dino
11
+
12
+ A Vision Transformer (ViT) image classification model. Trained with Self-Supervised DINO method.
13
+
14
+
15
+ ## Model Details
16
+ - **Model Type:** Image classification / feature backbone
17
+ - **Model Stats:**
18
+ - Params (M): 21.7
19
+ - GMACs: 16.8
20
+ - Activations (M): 32.9
21
+ - Image size: 224 x 224
22
+ - **Papers:**
23
+ - Emerging Properties in Self-Supervised Vision Transformers: https://arxiv.org/abs/2104.14294
24
+ - An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale: https://arxiv.org/abs/2010.11929v2
25
+ - **Dataset:** ImageNet-1k
26
+ - **Original:** https://github.com/facebookresearch/dino
27
+
28
+ ## Model Usage
29
+ ### Image Classification
30
+ ```python
31
+ from urllib.request import urlopen
32
+ from PIL import Image
33
+ import timm
34
+
35
+ img = Image.open(urlopen(
36
+ 'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
37
+ ))
38
+
39
+ model = timm.create_model('vit_small_patch8_224.dino', pretrained=True)
40
+ model = model.eval()
41
+
42
+ # get model specific transforms (normalization, resize)
43
+ data_config = timm.data.resolve_model_data_config(model)
44
+ transforms = timm.data.create_transform(**data_config, is_training=False)
45
+
46
+ output = model(transforms(img).unsqueeze(0)) # unsqueeze single image into batch of 1
47
+
48
+ top5_probabilities, top5_class_indices = torch.topk(output.softmax(dim=1) * 100, k=5)
49
+ ```
50
+
51
+ ### Image Embeddings
52
+ ```python
53
+ from urllib.request import urlopen
54
+ from PIL import Image
55
+ import timm
56
+
57
+ img = Image.open(urlopen(
58
+ 'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
59
+ ))
60
+
61
+ model = timm.create_model(
62
+ 'vit_small_patch8_224.dino',
63
+ pretrained=True,
64
+ num_classes=0, # remove classifier nn.Linear
65
+ )
66
+ model = model.eval()
67
+
68
+ # get model specific transforms (normalization, resize)
69
+ data_config = timm.data.resolve_model_data_config(model)
70
+ transforms = timm.data.create_transform(**data_config, is_training=False)
71
+
72
+ output = model(transforms(img).unsqueeze(0)) # output is (batch_size, num_features) shaped tensor
73
+
74
+ # or equivalently (without needing to set num_classes=0)
75
+
76
+ output = model.forward_features(transforms(img).unsqueeze(0))
77
+ # output is unpooled, a (1, 785, 384) shaped tensor
78
+
79
+ output = model.forward_head(output, pre_logits=True)
80
+ # output is a (1, num_features) shaped tensor
81
+ ```
82
+
83
+ ## Model Comparison
84
+ Explore the dataset and runtime metrics of this model in timm [model results](https://github.com/huggingface/pytorch-image-models/tree/main/results).
85
+
86
+ ## Citation
87
+ ```bibtex
88
+ @inproceedings{caron2021emerging,
89
+ title={Emerging properties in self-supervised vision transformers},
90
+ author={Caron, Mathilde and Touvron, Hugo and Misra, Ishan and J{'e}gou, Herv{'e} and Mairal, Julien and Bojanowski, Piotr and Joulin, Armand},
91
+ booktitle={Proceedings of the IEEE/CVF international conference on computer vision},
92
+ pages={9650--9660},
93
+ year={2021}
94
+ }
95
+ ```
96
+ ```bibtex
97
+ @article{dosovitskiy2020vit,
98
+ title={An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale},
99
+ author={Dosovitskiy, Alexey and Beyer, Lucas and Kolesnikov, Alexander and Weissenborn, Dirk and Zhai, Xiaohua and Unterthiner, Thomas and Dehghani, Mostafa and Minderer, Matthias and Heigold, Georg and Gelly, Sylvain and Uszkoreit, Jakob and Houlsby, Neil},
100
+ journal={ICLR},
101
+ year={2021}
102
+ }
103
+ ```
104
+ ```bibtex
105
+ @misc{rw2019timm,
106
+ author = {Ross Wightman},
107
+ title = {PyTorch Image Models},
108
+ year = {2019},
109
+ publisher = {GitHub},
110
+ journal = {GitHub repository},
111
+ doi = {10.5281/zenodo.4414861},
112
+ howpublished = {\url{https://github.com/huggingface/pytorch-image-models}}
113
+ }
114
+ ```
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e1b79616ce6d1b8dc3cc0d5952b6d0958356a189122882493e665d28f20654a0
3
+ size 86694768