aarticerebras
commited on
Commit
•
b26a998
1
Parent(s):
374b680
Update README.md (#1)
Browse files- Update README.md (073294f0e228d9c479d58aee01cb3be547c45c8d)
README.md
CHANGED
@@ -1,3 +1,40 @@
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
---
|
4 |
+
# Model Card for cerebras/Cerebras-LLaVA-7B
|
5 |
+
|
6 |
+
The checkpoints consists of Language encoder and projector weights of multimodal LLaVA-7B model trained with our Cerebras implementation and training recipe.
|
7 |
+
The vision encoder checkpoints for this model can be found at [cerebras/Cerebras-ViT-L-336-patch14-llava7b-ShareGPT4V](https://huggingface.co/cerebras/Cerebras-ViT-L-336-patch14-llava7b-ShareGPT4V)
|
8 |
+
|
9 |
+
**Note**: _ShareGPT4V_ is added to the vision model name to ensure correct loading of checkpoints in [LLaVA source repo](https://github.com/haotian-liu/LLaVA/blob/main/llava/model/multimodal_encoder/builder.py#L8)
|
10 |
+
|
11 |
+
For full details of this model and training details, please read our paper and release blog post **to be released shortly**.
|
12 |
+
|
13 |
+
# Model Architecture
|
14 |
+
Cerebras-LLaVA-7B is a transformer model with the following architecture details
|
15 |
+
* Vision encoder: [CLIP-VisionModel-Large](cerebras/Cerebras-ViT-L-336-patch14-llava7b-ShareGPT4V). It handles images of size 336 x 336 with patch size of 14
|
16 |
+
* Large Language Model: Pretrained from Vicuna-7B checkpoints and instruction finetuned on various datasets.
|
17 |
+
* Projector: the projector module that connects the LLM and Vision encoder part consists of two linear layers with gelu activation (mlp2x-gelu)
|
18 |
+
|
19 |
+
# Loading the model
|
20 |
+
|
21 |
+
This model can directly be loaded using the [LLaVa source code repository](https://github.com/haotian-liu/LLaVA). For installation, please refer to the [instructions in source code repository](https://github.com/haotian-liu/LLaVA?tab=readme-ov-file#install).
|
22 |
+
|
23 |
+
```
|
24 |
+
from llava.model.builder import load_pretrained_model
|
25 |
+
from llava.mm_utils import get_model_name_from_path
|
26 |
+
from llava.eval.run_llava import eval_model
|
27 |
+
|
28 |
+
model_path = "cerebras/Cerebras-LLaVA-7B"
|
29 |
+
|
30 |
+
tokenizer, model, image_processor, context_len = load_pretrained_model(
|
31 |
+
model_path=model_path,
|
32 |
+
model_base=None,
|
33 |
+
model_name=get_model_name_from_path(model_path)
|
34 |
+
)
|
35 |
+
```
|
36 |
+
|
37 |
+
# Acknowledgements
|
38 |
+
We are thankful to all Cerebras engineers, past and present, that made this work possible.
|
39 |
+
|
40 |
+
|