Image Classification
Core ML
File size: 2,947 Bytes
59a0903
 
 
f7434a2
59a0903
09e81d2
 
59a0903
 
 
9d35df4
f7434a2
59a0903
 
 
 
9d35df4
59a0903
 
 
 
 
 
 
 
 
f7434a2
 
3339283
 
 
 
9d35df4
 
3339283
2e66525
3339283
 
 
 
 
 
 
 
9d35df4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f7434a2
9d35df4
f7434a2
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
---
tags:
- image-classification
library_name: coreml
license: other
license_name: apple-ascl
license_link: LICENSE
datasets:
- imagenet-1k
---

# FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization

Please observe [original license](https://github.com/apple/ml-fastvit/blob/8af5928238cab99c45f64fc3e4e7b1516b8224ba/LICENSE).

## Model Details

- **Model Type:** Image classification / feature backbone
- **Model Stats:**
  - Params (M): 4.0
  - GMACs: 0.7
  - Activations (M): 8.6
  - Image size: 256 x 256
- **Papers:**
  - FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization: https://arxiv.org/abs/2303.14189
- **Original:** https://github.com/apple/ml-fastvit
- **Dataset:** ImageNet-1k

## Evaluation - Variants

| Variant                                                 | Parameters | Size (MB) | Weight precision | Act. precision | Δ Pytorch acc |
| ------------------------------------------------------- | ---------: | --------: | ---------------- | -------------- | ------------- |
| [T8](https://huggingface.co/apple/FastViTT8F16.mlpackage)     |      3.6M  |       7.8 | Float16          | Float16        |  -0.9%        |
| [MA36](https://huggingface.co/apple/FastViTMA36F16.mlpackage) |      42.7M |        84 | Float16          | Float16        | -0.06%        |

## Evaluation - Inference time

| Variant | Device               | OS   | Inference time (ms) | Dominant compute unit |
| ------- | -------------------- | ---- | ------------------: | --------------------- |
|    T8   | iPhone 12 Pro Max    | 17.5 |                0.79 | Neural Engine         |
|    T8   | M3 Max               | 14.4 |                0.62 | Neural Engine         |
|   MA36  | iPhone 12 Pro Max    | 18.0 |                4.50 | Neural Engine         |
|   MA36  | M3 Max               | 15.0 |                2.99 | Neural Engine         |

## Download

Install `huggingface-cli`

```bash
brew install huggingface-cli
```

To download one of the `.mlpackage` folders to the `models` directory:

```bash
huggingface-cli download \
  --local-dir models --local-dir-use-symlinks False \
  apple/coreml-FastViT-T8 
```

## Integrate in Swift apps

The [`huggingface/coreml-examples`](https://github.com/huggingface/coreml-examples/blob/main/depth-anything-example/README.md) repository contains sample Swift code for `coreml-FastViT-T8` and other models. See [the instructions there](https://github.com/huggingface/coreml-examples/tree/main/FastViTSample) to build the demo app, which shows how to use the model in your own Swift apps.

## Citation

```bibtex
@inproceedings{vasufastvit2023,
  author = {Pavan Kumar Anasosalu Vasu and James Gabriel and Jeff Zhu and Oncel Tuzel and Anurag Ranjan},
  title = {FastViT:  A Fast Hybrid Vision Transformer using Structural Reparameterization},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  year = {2023}
}
```