Last month was great for faster/smaller segmentation models, and I wanted to dedicate my first post to compile the recently released SAM variants! 🤗 📚 All models and their demos can be found in this collection 👉🏼 merve/segment-anything-model-6585835fc76915aa14e2bcbd The ideas behind them are mostly about making heavy image encoder lighter either through distillation or changing the pre-training. 💡 ⚡️MobileSAM: It decouples the heavy image encoder of SAM and distills it into a TinyViT to make SAM smaller. The architecture is same except for the encoder. ⚡️TinySAM: It distills the whole model with online hard prompt sampling. The authors also quantized it and released Q-TinySAM. ⚡️ EfficientSAM: This model combines masked image pre-training for training lightweight image encoders (like ViTMAE, learns to reconstruct the images) and mask decoder. ⚡️ FastSAM: It's a CNN-based model where the problem is modeled as segments generation. The inference takes place as everything is segmented at once and then you can prompt with boxes or points or text (and this is how it is similar to SAM). So the architecture is nowhere similar to original SAM itself. ✨ [NEW] SlimSAM: It's a pruned-distilled version of pre-trained SAM. The architecture is same so @nielsr recently converted the weights and you can use it with the same API you use with SAM models. You can find the available checkpoints in the collection. I hope you liked it!