Update README.md
Browse files
README.md
CHANGED
@@ -3,6 +3,48 @@ license: apache-2.0
|
|
3 |
pipeline_tag: mask-generation
|
4 |
---
|
5 |
|
6 |
-
# EfficientViT-SAM
|
7 |
|
8 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
pipeline_tag: mask-generation
|
4 |
---
|
5 |
|
6 |
+
# EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
|
7 |
|
8 |
+
- [Paper](https://arxiv.org/abs/2402.05008)
|
9 |
+
- [GitHub](https://github.com/mit-han-lab/efficientvit)
|
10 |
+
- [Demo](https://evitsam.hanlab.ai/)
|
11 |
+
|
12 |
+
## Pretrained Models
|
13 |
+
|
14 |
+
Latency/Throughput is measured on NVIDIA Jetson AGX Orin, and NVIDIA A100 GPU with TensorRT, fp16. Data transfer time is included.
|
15 |
+
|
16 |
+
| Model | Resolution | COCO mAP | LVIS mAP | Params | MACs | Jetson Orin Latency (bs1) | A100 Throughput (bs16) | Checkpoint |
|
17 |
+
|----------------------|:----------:|:----------:|:---------:|:------------:|:---------:|:---------:|:------------:|:------------:|
|
18 |
+
| EfficientViT-SAM-L0 | 512x512 | 45.7 | 41.8 | 34.8M | 35G | 8.2ms | 762 images/s | [link](https://huggingface.co/han-cai/efficientvit-sam/resolve/main/l0.pt) |
|
19 |
+
| EfficientViT-SAM-L1 | 512x512 | 46.2 | 42.1 | 47.7M | 49G | 10.2ms | 638 images/s | [link](https://huggingface.co/han-cai/efficientvit-sam/resolve/main/l1.pt) |
|
20 |
+
| EfficientViT-SAM-L2 | 512x512 | 46.6 | 42.7 | 61.3M | 69G | 12.9ms | 538 images/s | [link](https://huggingface.co/han-cai/efficientvit-sam/resolve/main/l2.pt) |
|
21 |
+
| EfficientViT-SAM-XL0 | 1024x1024 | 47.5 | 43.9 | 117.0M | 185G | 22.5ms | 278 images/s | [link](https://huggingface.co/han-cai/efficientvit-sam/resolve/main/xl0.pt) |
|
22 |
+
| EfficientViT-SAM-XL1 | 1024x1024 | 47.8 | 44.4 | 203.3M | 322G | 37.2ms | 182 images/s | [link](https://huggingface.co/han-cai/efficientvit-sam/resolve/main/xl1.pt) |
|
23 |
+
<p align="center">
|
24 |
+
<b> Table1: Summary of All EfficientViT-SAM Variants.</b> COCO mAP and LVIS mAP are measured using ViTDet's predicted bounding boxes as the prompt. End-to-end Jetson Orin latency and A100 throughput are measured with TensorRT and fp16.
|
25 |
+
</p>
|
26 |
+
|
27 |
+
## Usage
|
28 |
+
|
29 |
+
```python
|
30 |
+
# segment anything
|
31 |
+
from efficientvit.sam_model_zoo import create_sam_model
|
32 |
+
|
33 |
+
efficientvit_sam = create_sam_model(
|
34 |
+
name="xl1", weight_url="assets/checkpoints/sam/xl1.pt",
|
35 |
+
)
|
36 |
+
efficientvit_sam = efficientvit_sam.cuda().eval()
|
37 |
+
```
|
38 |
+
|
39 |
+
```python
|
40 |
+
from efficientvit.models.efficientvit.sam import EfficientViTSamPredictor
|
41 |
+
|
42 |
+
efficientvit_sam_predictor = EfficientViTSamPredictor(efficientvit_sam)
|
43 |
+
```
|
44 |
+
|
45 |
+
```python
|
46 |
+
from efficientvit.models.efficientvit.sam import EfficientViTSamAutomaticMaskGenerator
|
47 |
+
|
48 |
+
efficientvit_mask_generator = EfficientViTSamAutomaticMaskGenerator(efficientvit_sam)
|
49 |
+
|
50 |
+
```
|