Spaces:

divyareddy
/

imagebot

Runtime error

luodian commited on Nov 8, 2023

Commit

17283ee

•

1 Parent(s): 9b21e6e

Update app.py

Files changed (1) hide show

app.py CHANGED Viewed

@@ -107,13 +107,14 @@ def http_bot(image_input, text_input, request: gr.Request):
 title = """
 # OTTER-HD: A High-Resolution Multi-modality Model
-[[Otter Codebase]](https://github.com/Luodian/Otter) [[Paper]]() [[Checkpoints & Benchmarks]](https://huggingface.co/Otter-AI)
 **OtterHD** is a multimodal fine-tuned from [Fuyu-8B](https://huggingface.co/adept/fuyu-8b) to facilitate a more fine-grained interpretation of high-resolution visual input *without a explicit vision encoder module*. All image patches are linear transformed and processed together with text tokens. This is a very innovative and elegant exploration. We are fascinated and paved in this way, we opensourced the finetune script for Fuyu-8B and improve training throughput by 4-5 times faster with [Flash-Attention-2](https://github.com/Dao-AILab/flash-attention).
 **Tips**:
 - Since high-res images are large that may cause the longer transmit time from HF Space to our backend server. Please be kinda patient for the response.
 - The model is currently mainly focus on high-res image resolution and need to be futher improved on (1) hallucination reduction (2) text formatting control and some more you can spot and suggest to us.
 """
 css = """

 title = """
 # OTTER-HD: A High-Resolution Multi-modality Model
+[[Otter Codebase]](https://github.com/Luodian/Otter) [[Paper]](https://arxiv.org/abs/2311.04219) [[Checkpoints & Benchmarks]](https://huggingface.co/Otter-AI)
 **OtterHD** is a multimodal fine-tuned from [Fuyu-8B](https://huggingface.co/adept/fuyu-8b) to facilitate a more fine-grained interpretation of high-resolution visual input *without a explicit vision encoder module*. All image patches are linear transformed and processed together with text tokens. This is a very innovative and elegant exploration. We are fascinated and paved in this way, we opensourced the finetune script for Fuyu-8B and improve training throughput by 4-5 times faster with [Flash-Attention-2](https://github.com/Dao-AILab/flash-attention).
 **Tips**:
 - Since high-res images are large that may cause the longer transmit time from HF Space to our backend server. Please be kinda patient for the response.
 - The model is currently mainly focus on high-res image resolution and need to be futher improved on (1) hallucination reduction (2) text formatting control and some more you can spot and suggest to us.
+- We are working on a new set of experiments with around 2M data so you could check the demos and checkpoints later.
 """
 css = """