SRDdev commited on
Commit
bd49ee4
1 Parent(s): d8ac833

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +34 -0
README.md ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ datasets:
4
+ - HuggingFaceM4/flickr30k
5
+ language:
6
+ - en
7
+ library_name: transformers
8
+ pipeline_tag: image-to-text
9
+ ---
10
+
11
+ # CLIP
12
+
13
+ In the paper titled "Learning Transferable Visual Models From Natural Language Supervision," OpenAI introduces CLIP, short for Contrastive Language-Image Pre-training. This model learns how sentences and images are related, retrieving the most relevant images for a given sentence during training. What sets CLIP apart is its training on complete sentences instead of individual categories like cars or dogs. This approach allows the model to learn more and discover patterns between images and text. When trained on a large dataset of images and their corresponding texts, CLIP can also function as a classifier, outperforming models trained directly on ImageNet for classification tasks. Further exploration of the paper reveals in-depth details and astonishing outcomes.
14
+
15
+ ## Useage
16
+
17
+ ```python
18
+ from PIL import Image
19
+ import requests
20
+
21
+ from transformers import CLIPProcessor, CLIPModel
22
+
23
+ model = CLIPModel.from_pretrained("SRDdev/CLIP")
24
+ processor = CLIPProcessor.from_pretrained("SRDdev/CLIP")
25
+
26
+ url = "http://images.cocodataset.org/val2017/000000039769.jpg"
27
+ image = Image.open(requests.get(url, stream=True).raw)
28
+
29
+ inputs = processor(text=["a photo of a cat", "a photo of a dog"], images=image, return_tensors="pt", padding=True)
30
+
31
+ outputs = model(**inputs)
32
+ logits_per_image = outputs.logits_per_image # this is the image-text similarity score
33
+ probs = logits_per_image.softmax(dim=1) # we can take the softmax to get the label probabilities
34
+ ```