SRDdev
/

CLIP

Text-to-Image

Transformers

English

Inference Endpoints

Model card Files Files and versions Community

SRDdev commited on Mar 8

Commit

bd49ee4

•

1 Parent(s): d8ac833

Create README.md

Browse files

Files changed (1) hide show

README.md +34 -0

README.md ADDED Viewed

	@@ -0,0 +1,34 @@

+---
+license: mit
+datasets:
+- HuggingFaceM4/flickr30k
+language:
+- en
+library_name: transformers
+pipeline_tag: image-to-text
+---
+# CLIP
+In the paper titled "Learning Transferable Visual Models From Natural Language Supervision," OpenAI introduces CLIP, short for Contrastive Language-Image Pre-training. This model learns how sentences and images are related, retrieving the most relevant images for a given sentence during training. What sets CLIP apart is its training on complete sentences instead of individual categories like cars or dogs. This approach allows the model to learn more and discover patterns between images and text. When trained on a large dataset of images and their corresponding texts, CLIP can also function as a classifier, outperforming models trained directly on ImageNet for classification tasks. Further exploration of the paper reveals in-depth details and astonishing outcomes.
+## Useage
+```python
+from PIL import Image
+import requests
+from transformers import CLIPProcessor, CLIPModel
+model = CLIPModel.from_pretrained("SRDdev/CLIP")
+processor = CLIPProcessor.from_pretrained("SRDdev/CLIP")
+url = "http://images.cocodataset.org/val2017/000000039769.jpg"
+image = Image.open(requests.get(url, stream=True).raw)
+inputs = processor(text=["a photo of a cat", "a photo of a dog"], images=image, return_tensors="pt", padding=True)
+outputs = model(**inputs)
+logits_per_image = outputs.logits_per_image  # this is the image-text similarity score
+probs = logits_per_image.softmax(dim=1)  # we can take the softmax to get the label probabilities
+```