luodian commited on
Commit
17283ee
1 Parent(s): 9b21e6e

Update app.py

Browse files
Files changed (1) hide show
  1. app.py +2 -1
app.py CHANGED
@@ -107,13 +107,14 @@ def http_bot(image_input, text_input, request: gr.Request):
107
 
108
  title = """
109
  # OTTER-HD: A High-Resolution Multi-modality Model
110
- [[Otter Codebase]](https://github.com/Luodian/Otter) [[Paper]]() [[Checkpoints & Benchmarks]](https://huggingface.co/Otter-AI)
111
 
112
  **OtterHD** is a multimodal fine-tuned from [Fuyu-8B](https://huggingface.co/adept/fuyu-8b) to facilitate a more fine-grained interpretation of high-resolution visual input *without a explicit vision encoder module*. All image patches are linear transformed and processed together with text tokens. This is a very innovative and elegant exploration. We are fascinated and paved in this way, we opensourced the finetune script for Fuyu-8B and improve training throughput by 4-5 times faster with [Flash-Attention-2](https://github.com/Dao-AILab/flash-attention).
113
 
114
  **Tips**:
115
  - Since high-res images are large that may cause the longer transmit time from HF Space to our backend server. Please be kinda patient for the response.
116
  - The model is currently mainly focus on high-res image resolution and need to be futher improved on (1) hallucination reduction (2) text formatting control and some more you can spot and suggest to us.
 
117
  """
118
 
119
  css = """
 
107
 
108
  title = """
109
  # OTTER-HD: A High-Resolution Multi-modality Model
110
+ [[Otter Codebase]](https://github.com/Luodian/Otter) [[Paper]](https://arxiv.org/abs/2311.04219) [[Checkpoints & Benchmarks]](https://huggingface.co/Otter-AI)
111
 
112
  **OtterHD** is a multimodal fine-tuned from [Fuyu-8B](https://huggingface.co/adept/fuyu-8b) to facilitate a more fine-grained interpretation of high-resolution visual input *without a explicit vision encoder module*. All image patches are linear transformed and processed together with text tokens. This is a very innovative and elegant exploration. We are fascinated and paved in this way, we opensourced the finetune script for Fuyu-8B and improve training throughput by 4-5 times faster with [Flash-Attention-2](https://github.com/Dao-AILab/flash-attention).
113
 
114
  **Tips**:
115
  - Since high-res images are large that may cause the longer transmit time from HF Space to our backend server. Please be kinda patient for the response.
116
  - The model is currently mainly focus on high-res image resolution and need to be futher improved on (1) hallucination reduction (2) text formatting control and some more you can spot and suggest to us.
117
+ - We are working on a new set of experiments with around 2M data so you could check the demos and checkpoints later.
118
  """
119
 
120
  css = """