Spaces:

divyareddy
/

imagebot

Runtime error

luodian commited on Nov 7, 2023

Commit

9b21e6e

•

1 Parent(s): f5d9401

Add OtterHD model description

Files changed (1) hide show

app.py CHANGED Viewed

@@ -109,6 +109,8 @@ title = """
 # OTTER-HD: A High-Resolution Multi-modality Model
 [[Otter Codebase]](https://github.com/Luodian/Otter) [[Paper]]() [[Checkpoints & Benchmarks]](https://huggingface.co/Otter-AI)
 **Tips**:
 - Since high-res images are large that may cause the longer transmit time from HF Space to our backend server. Please be kinda patient for the response.
 - The model is currently mainly focus on high-res image resolution and need to be futher improved on (1) hallucination reduction (2) text formatting control and some more you can spot and suggest to us.

 # OTTER-HD: A High-Resolution Multi-modality Model
 [[Otter Codebase]](https://github.com/Luodian/Otter) [[Paper]]() [[Checkpoints & Benchmarks]](https://huggingface.co/Otter-AI)
+**OtterHD** is a multimodal fine-tuned from [Fuyu-8B](https://huggingface.co/adept/fuyu-8b) to facilitate a more fine-grained interpretation of high-resolution visual input *without a explicit vision encoder module*. All image patches are linear transformed and processed together with text tokens. This is a very innovative and elegant exploration. We are fascinated and paved in this way, we opensourced the finetune script for Fuyu-8B and improve training throughput by 4-5 times faster with [Flash-Attention-2](https://github.com/Dao-AILab/flash-attention).
 **Tips**:
 - Since high-res images are large that may cause the longer transmit time from HF Space to our backend server. Please be kinda patient for the response.
 - The model is currently mainly focus on high-res image resolution and need to be futher improved on (1) hallucination reduction (2) text formatting control and some more you can spot and suggest to us.