vision_papers / pages /10_Painter.py
lbourdois's picture
Upload 174 files
94e735e verified
raw
history blame
2.5 kB
import streamlit as st
from streamlit_extras.switch_page_button import switch_page
st.title("Painter")
st.success("""[Original tweet](https://twitter.com/mervenoyann/status/1771542172946354643) (March 23, 2024)""", icon="ℹ️")
st.markdown(""" """)
st.markdown("""I read the Painter [paper](https://t.co/r3aHp29mjf) by [BAAIBeijing](https://x.com/BAAIBeijing) to convert the weights to πŸ€— Transformers, and I absolutely loved the approach they took so I wanted to take time to unfold it here!
""")
st.markdown(""" """)
st.image("pages/Painter/image_1.jpeg", use_column_width=True)
st.markdown(""" """)
st.markdown("""So essentially this model takes inspiration from in-context learning, as in, in LLMs you give an example input output and give the actual input that you want model to complete (one-shot learning) they adapted this to images, thus the name "images speak in images".
This model doesn't have any multimodal parts, it just has an image encoder and a decoder head (linear layer, conv layer, another linear layer) so it's a single modality.
The magic sauce is the data: they input the task in the form of image and associated transformation and another image they want the transformation to take place and take smooth L2 loss over the predictions and ground truth this is like T5 of image models πŸ˜€
""")
st.markdown(""" """)
st.image("pages/Painter/image_2.jpeg", use_column_width=True)
st.markdown(""" """)
st.markdown("""What is so cool about it is that it can actually adapt to out of domain tasks, meaning, in below chart, it was trained on the tasks above the dashed line, and the authors found out it generalized to the tasks below the line, image tasks are well generalized 🀯
""")
st.markdown(""" """)
st.image("pages/Painter/image_3.jpeg", use_column_width=True)
st.markdown(""" """)
st.info("""
Ressources:
[Images Speak in Images: A Generalist Painter for In-Context Visual Learning](https://arxiv.org/abs/2212.02499)
by Xinlong Wang, Wen Wang, Yue Cao, Chunhua Shen, Tiejun Huang (2022)
[GitHub](https://github.com/baaivision/Painter)""", icon="πŸ“š")
st.markdown(""" """)
st.markdown(""" """)
st.markdown(""" """)
col1, col2, col3 = st.columns(3)
with col1:
if st.button('Previous paper', use_container_width=True):
switch_page("LLaVA-NeXT")
with col2:
if st.button('Home', use_container_width=True):
switch_page("Home")
with col3:
if st.button('Next paper', use_container_width=True):
switch_page("SegGPT")