neuralmagic
/

CLIP-ViT-B-32-256x256-DataComp-s34B-b86K-quant-ds

Zero-Shot Classification

Model card Files Files and versions Community

mgoin commited on Dec 19, 2023

Commit

b198789

•

1 Parent(s): 1404c72

Create README.md

Files changed (1) hide show

README.md +63 -0

README.md ADDED Viewed

	@@ -0,0 +1,63 @@

+This is a quantized version of https://huggingface.co/laion/CLIP-ViT-B-32-256x256-DataComp-s34B-b86K that is ready to use with (DeepSparse)[https://github.com/neuralmagic/deepsparse]
+It achieves 71.1% one-shot accuracy on ImageNet.
+## Usage
+First, install DeepSparse with extensions for CLIP:
+```
+pip install deepsparse-nightly[clip]>=1.7.0.20231210
+```
+Download some test images of a church, a dog, and elephants:
+```
+wget -O basilica.jpg https://raw.githubusercontent.com/neuralmagic/deepsparse/main/src/deepsparse/yolo/sample_images/basilica.jpg
+wget -O buddy.jpeg https://raw.githubusercontent.com/neuralmagic/deepsparse/main/tests/deepsparse/pipelines/sample_images/buddy.jpeg
+wget -O thailand.jpg https://raw.githubusercontent.com/neuralmagic/deepsparse/main/src/deepsparse/yolact/sample_images/thailand.jpg
+```
+Then make and run a pipeline in Python:
+```python
+import numpy as np
+from deepsparse import Pipeline
+from deepsparse.clip import (
+    CLIPTextInput,
+    CLIPVisualInput,
+    CLIPZeroShotInput
+)
+def new_process_inputs(self, inputs: CLIPTextInput):
+    if not isinstance(inputs.text, list):
+        inputs.text = [inputs.text]
+    if not isinstance(inputs.text[0], str):
+        return inputs.text
+    tokens = [np.array(t).astype(np.int32) for t in self.tokenizer(inputs.text)]
+    tokens = np.stack(tokens, axis=0)
+    tokens_lengths = np.array(tokens.shape[0] * [tokens.shape[1] - 1])
+    return [tokens, tokens_lengths]
+# This overrides the process_inputs function globally for all CLIPTextPipeline classes,
+# so when we make a zeroshot pipeline later that uses this class, it will use this edit!
+CLIPTextPipeline.process_inputs = new_process_inputs
+possible_classes = ["ice cream", "an elephant", "a dog", "a building", "a church"]
+images = ["basilica.jpg", "buddy.jpeg", "thailand.jpg"]
+pipeline = Pipeline.create(task="clip_zeroshot", visual_model_path="visual.onnx", text_model_path="textual.onnx")
+pipeline_input = CLIPZeroShotInput(
+    image=CLIPVisualInput(images=images),
+    text=CLIPTextInput(text=possible_classes),
+)
+output = pipeline(pipeline_input).text_scores
+for i in range(len(output)):
+    prediction = possible_classes[np.argmax(output[i])]
+    print(f"Image {images[i]} is a picture of {prediction}")
+"""
+Image basilica.jpg is a picture of a church
+Image buddy.jpeg is a picture of a dog
+Image thailand.jpg is a picture of an elephant
+"""
+```