File size: 2,382 Bytes
b027fb0
 
502c77d
 
b4cf00d
 
 
 
 
 
 
5d845fe
 
 
 
 
502c77d
68336b4
 
 
 
 
 
 
 
 
 
 
5eb3f77
a094e0c
 
7cef707
68336b4
 
 
5eb3f77
68336b4
 
 
 
 
 
 
 
 
 
 
 
 
 
5eb3f77
f1e34bc
68336b4
 
 
 
 
 
 
 
 
 
 
 
 
5eb3f77
68336b4
 
 
 
cb72bec
68336b4
 
 
 
 
 
 
 
 
 
 
 
5eb3f77
68336b4
 
 
 
a094e0c
68336b4
 
 
 
5eb3f77
68336b4
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
---
license: openrail
inference: false
pipeline_tag: image-to-text
tags:
- image-to-text
- visual-question-answering
- image-captioning
datasets:
- coco
- textvqa
- VQAv2
- OK-VQA
- A-OKVQA
language:
- en

---

# QuickStart

## Installation
```
pip install promptcap
```

## Captioning Pipeline

Please follow the prompt format, which will give the best performance.

Generate a prompt-guided caption by following:
```python
import torch
from promptcap import PromptCap

model = PromptCap("vqascore/promptcap-coco-vqa")  # also support OFA checkpoints. e.g. "OFA-Sys/ofa-large"

if torch.cuda.is_available():
  model.cuda()

prompt = "please describe this image according to the given question: what piece of clothing is this boy putting on?"
image = "glove_boy.jpeg"

print(model.caption(prompt, image))
```

To try generic captioning, just use "please describe this image according to the given question: what does the image describe?"

PromptCap also support taking OCR inputs:

```python
prompt = "please describe this image according to the given question: what year was this taken?"
image = "dvds.jpg"
ocr = "yip AE Mht juor 02/14/2012"

print(model.caption(prompt, image, ocr))
```



## Visual Question Answering Pipeline

Different from typical VQA models, which are doing classification on VQAv2, PromptCap is open-domain and can be paired with arbitrary text-QA models.
Here we provide a pipeline for combining PromptCap with UnifiedQA.

```python
import torch
from promptcap import PromptCap_VQA

# QA model support all UnifiedQA variants. e.g. "allenai/unifiedqa-v2-t5-large-1251000"
vqa_model = PromptCap_VQA(promptcap_model="vqascore/promptcap-coco-vqa", qa_model="allenai/unifiedqa-t5-base")

if torch.cuda.is_available():
  vqa_model.cuda()

question = "what piece of clothing is this boy putting on?"
image = "glove_boy.jpeg"

print(vqa_model.vqa(question, image))
```

Similarly, PromptCap supports OCR inputs

```python
question = "what year was this taken?"
image = "dvds.jpg"
ocr = "yip AE Mht juor 02/14/2012"

print(vqa_model.vqa(question, image, ocr=ocr))
```

Because of the flexibility of Unifiedqa, PromptCap also supports multiple-choice VQA

```python
question = "what piece of clothing is this boy putting on?"
image = "glove_boy.jpeg"
choices = ["gloves", "socks", "shoes", "coats"]
print(vqa_model.vqa_multiple_choice(question, image, choices))
```