timm
/

Image Classification
timm
PyTorch
Safetensors
rwightman HF staff commited on
Commit
2b222e3
1 Parent(s): bfa7061
Files changed (4) hide show
  1. README.md +150 -0
  2. config.json +40 -0
  3. model.safetensors +3 -0
  4. pytorch_model.bin +3 -0
README.md ADDED
@@ -0,0 +1,150 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - image-classification
4
+ - timm
5
+ library_name: timm
6
+ license: apache-2.0
7
+ datasets:
8
+ - imagenet-1k
9
+ ---
10
+ # Model card for mambaout_base.in1k
11
+
12
+ A MambaOut image classification model. Pretrained on ImageNet-1k by paper authors.
13
+
14
+
15
+ ## Model Details
16
+ - **Model Type:** Image classification / feature backbone
17
+ - **Model Stats:**
18
+ - Params (M): 84.8
19
+ - GMACs: 15.8
20
+ - Activations (M): 36.9
21
+ - Image size: train = 224 x 224, test = 288 x 288
22
+ - **Dataset:** ImageNet-1k
23
+ - **Papers:**
24
+ - MambaOut: Do We Really Need Mamba for Vision?: https://arxiv.org/abs/2405.07992
25
+ - **Original:** https://github.com/yuweihao/MambaOut
26
+
27
+ ## Model Usage
28
+ ### Image Classification
29
+ ```python
30
+ from urllib.request import urlopen
31
+ from PIL import Image
32
+ import timm
33
+
34
+ img = Image.open(urlopen(
35
+ 'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
36
+ ))
37
+
38
+ model = timm.create_model('mambaout_base.in1k', pretrained=True)
39
+ model = model.eval()
40
+
41
+ # get model specific transforms (normalization, resize)
42
+ data_config = timm.data.resolve_model_data_config(model)
43
+ transforms = timm.data.create_transform(**data_config, is_training=False)
44
+
45
+ output = model(transforms(img).unsqueeze(0)) # unsqueeze single image into batch of 1
46
+
47
+ top5_probabilities, top5_class_indices = torch.topk(output.softmax(dim=1) * 100, k=5)
48
+ ```
49
+
50
+ ### Feature Map Extraction
51
+ ```python
52
+ from urllib.request import urlopen
53
+ from PIL import Image
54
+ import timm
55
+
56
+ img = Image.open(urlopen(
57
+ 'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
58
+ ))
59
+
60
+ model = timm.create_model(
61
+ 'mambaout_base.in1k',
62
+ pretrained=True,
63
+ features_only=True,
64
+ )
65
+ model = model.eval()
66
+
67
+ # get model specific transforms (normalization, resize)
68
+ data_config = timm.data.resolve_model_data_config(model)
69
+ transforms = timm.data.create_transform(**data_config, is_training=False)
70
+
71
+ output = model(transforms(img).unsqueeze(0)) # unsqueeze single image into batch of 1
72
+
73
+ for o in output:
74
+ # print shape of each feature map in output
75
+ # e.g.:
76
+ # torch.Size([1, 56, 56, 128])
77
+ # torch.Size([1, 28, 28, 256])
78
+ # torch.Size([1, 14, 14, 512])
79
+ # torch.Size([1, 7, 7, 768])
80
+
81
+ print(o.shape)
82
+ ```
83
+
84
+ ### Image Embeddings
85
+ ```python
86
+ from urllib.request import urlopen
87
+ from PIL import Image
88
+ import timm
89
+
90
+ img = Image.open(urlopen(
91
+ 'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
92
+ ))
93
+
94
+ model = timm.create_model(
95
+ 'mambaout_base.in1k',
96
+ pretrained=True,
97
+ num_classes=0, # remove classifier nn.Linear
98
+ )
99
+ model = model.eval()
100
+
101
+ # get model specific transforms (normalization, resize)
102
+ data_config = timm.data.resolve_model_data_config(model)
103
+ transforms = timm.data.create_transform(**data_config, is_training=False)
104
+
105
+ output = model(transforms(img).unsqueeze(0)) # output is (batch_size, num_features) shaped tensor
106
+
107
+ # or equivalently (without needing to set num_classes=0)
108
+
109
+ output = model.forward_features(transforms(img).unsqueeze(0))
110
+ # output is unpooled, a (1, 7, 7, 768) shaped tensor
111
+
112
+ output = model.forward_head(output, pre_logits=True)
113
+ # output is a (1, num_features) shaped tensor
114
+ ```
115
+
116
+ ## Model Comparison
117
+ ### By Top-1
118
+
119
+ |model |img_size|top1 |top5 |param_count|
120
+ |---------------------------------------------------------------------------------------------------------------------|--------|------|------|-----------|
121
+ |[mambaout_base_plus_rw.sw_e150_in12k_ft_in1k](http://huggingface.co/timm/mambaout_base_plus_rw.sw_e150_in12k_ft_in1k)|288 |86.912|98.236|101.66 |
122
+ |[mambaout_base_plus_rw.sw_e150_in12k_ft_in1k](http://huggingface.co/timm/mambaout_base_plus_rw.sw_e150_in12k_ft_in1k)|224 |86.632|98.156|101.66 |
123
+ |[mambaout_base_tall_rw.sw_e500_in1k](http://huggingface.co/timm/mambaout_base_tall_rw.sw_e500_in1k) |288 |84.974|97.332|86.48 |
124
+ |[mambaout_base_wide_rw.sw_e500_in1k](http://huggingface.co/timm/mambaout_base_wide_rw.sw_e500_in1k) |288 |84.962|97.208|94.45 |
125
+ |[mambaout_base_short_rw.sw_e500_in1k](http://huggingface.co/timm/mambaout_base_short_rw.sw_e500_in1k) |288 |84.832|97.27 |88.83 |
126
+ |[mambaout_base.in1k](http://huggingface.co/timm/mambaout_base.in1k) |288 |84.72 |96.93 |84.81 |
127
+ |[mambaout_small_rw.sw_e450_in1k](http://huggingface.co/timm/mambaout_small_rw.sw_e450_in1k) |288 |84.598|97.098|48.5 |
128
+ |[mambaout_small.in1k](http://huggingface.co/timm/mambaout_small.in1k) |288 |84.5 |96.974|48.49 |
129
+ |[mambaout_base_wide_rw.sw_e500_in1k](http://huggingface.co/timm/mambaout_base_wide_rw.sw_e500_in1k) |224 |84.454|96.864|94.45 |
130
+ |[mambaout_base_tall_rw.sw_e500_in1k](http://huggingface.co/timm/mambaout_base_tall_rw.sw_e500_in1k) |224 |84.434|96.958|86.48 |
131
+ |[mambaout_base_short_rw.sw_e500_in1k](http://huggingface.co/timm/mambaout_base_short_rw.sw_e500_in1k) |224 |84.362|96.952|88.83 |
132
+ |[mambaout_base.in1k](http://huggingface.co/timm/mambaout_base.in1k) |224 |84.168|96.68 |84.81 |
133
+ |[mambaout_small.in1k](http://huggingface.co/timm/mambaout_small.in1k) |224 |84.086|96.63 |48.49 |
134
+ |[mambaout_small_rw.sw_e450_in1k](http://huggingface.co/timm/mambaout_small_rw.sw_e450_in1k) |224 |84.024|96.752|48.5 |
135
+ |[mambaout_tiny.in1k](http://huggingface.co/timm/mambaout_tiny.in1k) |288 |83.448|96.538|26.55 |
136
+ |[mambaout_tiny.in1k](http://huggingface.co/timm/mambaout_tiny.in1k) |224 |82.736|96.1 |26.55 |
137
+ |[mambaout_kobe.in1k](http://huggingface.co/timm/mambaout_kobe.in1k) |288 |81.054|95.718|9.14 |
138
+ |[mambaout_kobe.in1k](http://huggingface.co/timm/mambaout_kobe.in1k) |224 |79.986|94.986|9.14 |
139
+ |[mambaout_femto.in1k](http://huggingface.co/timm/mambaout_femto.in1k) |288 |79.848|95.14 |7.3 |
140
+ |[mambaout_femto.in1k](http://huggingface.co/timm/mambaout_femto.in1k) |224 |78.87 |94.408|7.3 |
141
+
142
+ ## Citation
143
+ ```bibtex
144
+ @article{yu2024mambaout,
145
+ title={MambaOut: Do We Really Need Mamba for Vision?},
146
+ author={Yu, Weihao and Wang, Xinchao},
147
+ journal={arXiv preprint arXiv:2405.07992},
148
+ year={2024}
149
+ }
150
+ ```
config.json ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architecture": "mambaout_base",
3
+ "num_classes": 1000,
4
+ "num_features": 768,
5
+ "pretrained_cfg": {
6
+ "tag": "in1k",
7
+ "custom_load": false,
8
+ "input_size": [
9
+ 3,
10
+ 224,
11
+ 224
12
+ ],
13
+ "test_input_size": [
14
+ 3,
15
+ 288,
16
+ 288
17
+ ],
18
+ "fixed_input_size": false,
19
+ "interpolation": "bicubic",
20
+ "crop_pct": 1.0,
21
+ "crop_mode": "center",
22
+ "mean": [
23
+ 0.485,
24
+ 0.456,
25
+ 0.406
26
+ ],
27
+ "std": [
28
+ 0.229,
29
+ 0.224,
30
+ 0.225
31
+ ],
32
+ "num_classes": 1000,
33
+ "pool_size": [
34
+ 7,
35
+ 7
36
+ ],
37
+ "first_conv": "stem.conv1",
38
+ "classifier": "head.fc"
39
+ }
40
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:19723a5235e3d68c7b46b49f5d1025a61df6ddc6d42d555e33c1fbe9f139769c
3
+ size 339284368
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:801791762a8479fba3b23994b9beff1d18a83c7767416e8a090867d9903342eb
3
+ size 339366810