llava-hf
/

vip-llava-7b-hf

Image-Text-to-Text

Model card Files Files and versions Community

ybelkada commited on Dec 14, 2023

Commit

2d31ed7

•

1 Parent(s): 5020186

Update README.md

Files changed (1) hide show

README.md +14 -3

README.md CHANGED Viewed

@@ -31,7 +31,15 @@ https://llava-vl.github.io/
 ## How to use the model
 First, make sure to have `transformers >= 4.35.3`.
-The model supports multi-image and multi-prompt generation. Meaning that you can pass multiple images in your prompt. Make sure also to follow the correct prompt template (`USER: xxx\nASSISTANT:`) and add the token `<image>` to the location where you want to query images:
 ### Using `pipeline`:
@@ -46,7 +54,8 @@ pipe = pipeline("image-to-text", model=model_id)
 url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/ai2d-demo.jpg"
 image = Image.open(requests.get(url, stream=True).raw)
-prompt = "USER: <image>\nWhat does the label 15 represent? (1) lava (2) core (3) tunnel (4) ash cloud\nASSISTANT:"
 outputs = pipe(image, prompt=prompt, generate_kwargs={"max_new_tokens": 200})
 print(outputs)
@@ -65,7 +74,9 @@ from transformers import AutoProcessor, VipLlavaForConditionalGeneration
 model_id = "llava-hf/vip-llava-7b-hf"
-prompt = "USER: <image>\nWhat are these?\nASSISTANT:"
 image_file = "http://images.cocodataset.org/val2017/000000039769.jpg"
 model = VipLlavaForConditionalGeneration.from_pretrained(

 ## How to use the model
 First, make sure to have `transformers >= 4.35.3`.
+The model supports multi-image and multi-prompt generation. Meaning that you can pass multiple images in your prompt. Make sure also to follow the correct prompt template and add the token `<image>` to the location where you want to query images:
+According to the official code base, it is recommeneded to use this template:
+```bash
+A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions.###Human: <image>\n<prompt>###Assistant:
+```
+Where `<prompt>` denotes the prompt asked by the user
 ### Using `pipeline`:
 url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/ai2d-demo.jpg"
 image = Image.open(requests.get(url, stream=True).raw)
+question = "What does the label 15 represent? (1) lava (2) core (3) tunnel (4) ash cloud"
+prompt = f"A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions.###Human: <image>\n{question}###Assistant:"
 outputs = pipe(image, prompt=prompt, generate_kwargs={"max_new_tokens": 200})
 print(outputs)
 model_id = "llava-hf/vip-llava-7b-hf"
+question = "What are these?"
+prompt = f"A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions.###Human: <image>\n{question}###Assistant:"
 image_file = "http://images.cocodataset.org/val2017/000000039769.jpg"
 model = VipLlavaForConditionalGeneration.from_pretrained(