Spaces:
Running
Running
SegGPT is a vision generalist on image segmentation, quite like GPT for computer vision ✨ | |
It comes with the last release of 🤗 Transformers. | |
🎁 Technical details, demo and how-to's under this! | |
![image_1](image_1.jpg) | |
SegGPT is an extension of the Painter, where you speak to images with images: the model takes in an image prompt, transformed version of the image prompt, the actual image you want to see the same transform, and expected to output the transformed image. | |
SegGPT consists of a vanilla ViT with a decoder on top (linear, conv, linear). The model is trained on diverse segmentation examples, where they provide example image-mask pairs, the actual input to be segmented, and the decoder head learns to reconstruct the mask output. 👇🏻 | |
![image_2](image_2.jpg) | |
This generalizes pretty well! The authors do not claim state-of-the-art results as the model is mainly used zero-shot and few-shot inference. They also do prompt tuning, where they freeze the parameters of the model and only optimize the image tensor (the input context). | |
![image_3](image_3.jpg) | |
Thanks to 🤗 Transformers you can use this model easily! See [here](https://t.co/U5pVpBhkfK). | |
![image_4](image_4.jpg) | |
I have built an app for you to try it out. I combined SegGPT with Depth Anything Model, so you don't have to upload image mask prompts in your prompt pair 🤗 | |
Try it [here](https://t.co/uJIwqJeYUy). Also check out the [collection](https://t.co/HvfjWkAEzP). | |
![image_5](image_5.jpg) | |
> [!TIP] | |
Ressources: | |
[SegGPT: Segmenting Everything In Context](https://arxiv.org/abs/2304.03284) | |
by Xinlong Wang, Xiaosong Zhang, Yue Cao, Wen Wang, Chunhua Shen, Tiejun Huang (2023) | |
[GitHub](https://github.com/baaivision/Painter) | |
> [!NOTE] | |
[Original tweet](https://x.com/mervenoyann/status/1773056450790666568) (March 27, 2024) | |