Spaces:
Runtime error
Runtime error
Add OtterHD model description
Browse files
app.py
CHANGED
@@ -109,6 +109,8 @@ title = """
|
|
109 |
# OTTER-HD: A High-Resolution Multi-modality Model
|
110 |
[[Otter Codebase]](https://github.com/Luodian/Otter) [[Paper]]() [[Checkpoints & Benchmarks]](https://huggingface.co/Otter-AI)
|
111 |
|
|
|
|
|
112 |
**Tips**:
|
113 |
- Since high-res images are large that may cause the longer transmit time from HF Space to our backend server. Please be kinda patient for the response.
|
114 |
- The model is currently mainly focus on high-res image resolution and need to be futher improved on (1) hallucination reduction (2) text formatting control and some more you can spot and suggest to us.
|
|
|
109 |
# OTTER-HD: A High-Resolution Multi-modality Model
|
110 |
[[Otter Codebase]](https://github.com/Luodian/Otter) [[Paper]]() [[Checkpoints & Benchmarks]](https://huggingface.co/Otter-AI)
|
111 |
|
112 |
+
**OtterHD** is a multimodal fine-tuned from [Fuyu-8B](https://huggingface.co/adept/fuyu-8b) to facilitate a more fine-grained interpretation of high-resolution visual input *without a explicit vision encoder module*. All image patches are linear transformed and processed together with text tokens. This is a very innovative and elegant exploration. We are fascinated and paved in this way, we opensourced the finetune script for Fuyu-8B and improve training throughput by 4-5 times faster with [Flash-Attention-2](https://github.com/Dao-AILab/flash-attention).
|
113 |
+
|
114 |
**Tips**:
|
115 |
- Since high-res images are large that may cause the longer transmit time from HF Space to our backend server. Please be kinda patient for the response.
|
116 |
- The model is currently mainly focus on high-res image resolution and need to be futher improved on (1) hallucination reduction (2) text formatting control and some more you can spot and suggest to us.
|