Spaces:
Running
Running
import streamlit as st | |
from streamlit_extras.switch_page_button import switch_page | |
st.title("SegGPT") | |
st.success("""[Original tweet](https://x.com/mervenoyann/status/1773056450790666568) (March 27, 2024)""", icon="βΉοΈ") | |
st.markdown(""" """) | |
st.markdown("""SegGPT is a vision generalist on image segmentation, quite like GPT for computer vision β¨ | |
It comes with the last release of π€ Transformers π | |
Technical details, demo and how-to's under this! | |
""") | |
st.markdown(""" """) | |
st.image("pages/SegGPT/image_1.jpeg", use_column_width=True) | |
st.markdown(""" """) | |
st.markdown("""SegGPT is an extension of the <a href='Painter' target='_self'>Painter</a> where you speak to images with images: the model takes in an image prompt, transformed version of the image prompt, the actual image you want to see the same transform, and expected to output the transformed image. | |
SegGPT consists of a vanilla ViT with a decoder on top (linear, conv, linear). The model is trained on diverse segmentation examples, where they provide example image-mask pairs, the actual input to be segmented, and the decoder head learns to reconstruct the mask output. ππ» | |
""", unsafe_allow_html=True) | |
st.markdown(""" """) | |
st.image("pages/SegGPT/image_2.jpg", use_column_width=True) | |
st.markdown(""" """) | |
st.markdown(""" | |
This generalizes pretty well! | |
The authors do not claim state-of-the-art results as the model is mainly used zero-shot and few-shot inference. They also do prompt tuning, where they freeze the parameters of the model and only optimize the image tensor (the input context). | |
""") | |
st.markdown(""" """) | |
st.image("pages/SegGPT/image_3.jpg", use_column_width=True) | |
st.markdown(""" """) | |
st.markdown("""Thanks to π€ Transformers you can use this model easily! See [here](https://t.co/U5pVpBhkfK). | |
""") | |
st.markdown(""" """) | |
st.image("pages/SegGPT/image_4.jpeg", use_column_width=True) | |
st.markdown(""" """) | |
st.markdown(""" | |
I have built an app for you to try it out. I combined SegGPT with Depth Anything Model, so you don't have to upload image mask prompts in your prompt pair π€ | |
Try it [here](https://t.co/uJIwqJeYUy). Also check out the [collection](https://t.co/HvfjWkAEzP). | |
""") | |
st.markdown(""" """) | |
st.image("pages/SegGPT/image_5.jpeg", use_column_width=True) | |
st.markdown(""" """) | |
st.info(""" | |
Ressources: | |
[SegGPT: Segmenting Everything In Context](https://arxiv.org/abs/2304.03284) | |
by Xinlong Wang, Xiaosong Zhang, Yue Cao, Wen Wang, Chunhua Shen, Tiejun Huang (2023) | |
[GitHub](https://github.com/baaivision/Painter)""", icon="π") | |
st.markdown(""" """) | |
st.markdown(""" """) | |
st.markdown(""" """) | |
col1, col2, col3 = st.columns(3) | |
with col1: | |
if st.button('Previous paper', use_container_width=True): | |
switch_page("Painter") | |
with col2: | |
if st.button('Home', use_container_width=True): | |
switch_page("Home") | |
with col3: | |
if st.button('Next paper', use_container_width=True): | |
switch_page("Grounding DINO") |