Spaces:
Runtime error
Runtime error
Read the MobileSAM paper this weekend 📖 Sharing some insights! | |
The idea 💡: SAM model consist of three parts, a heavy image encoder, a prompt encoder (prompt can be text, bounding box, mask or point) and a mask decoder. | |
To make the SAM model smaller without compromising from the performance, the authors looked into three types of distillation. | |
First one is distilling the decoder outputs directly (a more naive approach) with a completely randomly initialized small ViT and randomly initialized mask decoder. | |
However, when the ViT and the decoder are both in a bad state, this doesn't work well. | |
![image_1](image_1.jpg) | |
The second type of distillation is called semi-coupled, where the authors only randomly initialized the ViT image encoder and kept the mask decoder. | |
This is called semi-coupled because the image encoder distillation still depends on the mask decoder (see below 👇 ) | |
![image_2](image_2.jpg) | |
The last type of distillation, decoupled distillation, is the most intuitive IMO. | |
The authors have "decoupled" image encoder altogether and have frozen the mask decoder and didn't really distill based on generated masks. | |
This makes sense as the bottleneck here is the encoder itself and most of the time, distillation works well with encoding. | |
![image_3](image_3.jpg) | |
Finally, they found out that decoupled distillation performs better than coupled distillation by means of mean IoU and requires much less compute! ♥️ | |
![image_4](image_4.jpg) | |
Wanted to leave some links here if you'd like to try yourself 👇 | |
- MobileSAM [demo](https://huggingface.co/spaces/dhkim2810/MobileSAMMobileSAM) | |
- Model [repository](https://huggingface.co/dhkim2810/MobileSAM) | |
If you'd like to experiment around TinyViT, timm library has a bunch of [checkpoints available](https://huggingface.co/models?sort=trending&search=timm%2Ftinyvit). | |
![image_5](image_5.jpg) | |
> [!TIP] | |
Ressources: | |
[Faster Segment Anything: Towards Lightweight SAM for Mobile Applications](https://arxiv.org/abs/2306.14289) | |
by Chaoning Zhang, Dongshen Han, Yu Qiao, Jung Uk Kim, Sung-Ho Bae, Seungkyu Lee, Choong Seon Hong (2023) | |
[GitHub](https://github.com/ChaoningZhang/MobileSAM) | |
> [!NOTE] | |
[Original tweet](https://twitter.com/mervenoyann/status/1738959605542076863) (December 24, 2023) |