initial commit

Browse files

Files changed (3) hide show

.gitignore +3 -0
README.md +36 -1
handler.py +46 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,3 @@


1	+ # PhpStorm / IDEA
2	+ .idea
3	+

README.md CHANGED Viewed

@@ -1,3 +1,38 @@
 ---
-license: bsd-3-clause
 ---

 ---
+tags:
+  - vision
+  - text-to-image
+  - endpoints-template
+inference: true
+pipeline_tag: text-to-image
+base_model: Salesforce/blip-image-captioning-base
+library_name: generic
 ---
+# Fork of [Salesforce/blip-image-captioning-base](https://huggingface.co/openai/clip-vit-base-patch32) for a `text-to-image` Inference endpoint.
+> Based on https://huggingface.co/sergeipetrov/blip_captioning
+This repository implements a `custom` task for `text-to-image` for 🤗 Inference Endpoints to allow image capturing.
+The code for the customized pipeline is in the handler.py.
+To use deploy this model an Inference Endpoint you have to select `Custom` as task to use the `handler.py` file.
+### expected Request payload
+Image to be labeled as binary.
+#### CURL
+```
+curl URL \
+        -X POST \
+        --data-binary @car.png \
+        -H "Content-Type: image/png"
+```
+#### Python
+```python
+requests.post(ENDPOINT_URL, headers={"Content-Type": "image/png"}, data=open("car.png", 'rb').read()).json()
+```

handler.py ADDED Viewed

	@@ -0,0 +1,46 @@

+# +
+from typing import  Dict, List, Any
+from PIL import Image
+import torch
+import os
+from io import BytesIO
+from transformers import BlipForConditionalGeneration, BlipProcessor
+# -
+device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
+class EndpointHandler():
+    def __init__(self, path=""):
+        # load the optimized model
+        self.processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-base")
+        self.model = BlipForConditionalGeneration.from_pretrained(
+            "Salesforce/blip-image-captioning-base"
+        ).to(device)
+        self.model.eval()
+        self.model = self.model.to(device)
+    def __call__(self, data: Any) -> List[Dict[str, Any]]:
+        """
+        Args:
+            data (:obj:):
+                binary image data to be labeled
+        Return:
+            A :obj:`list`:. The list contains an item with generated caption, like [{"generated_text": ["A hugging face at the office"]}] :
+                - "generated_text": A string corresponding to the generated caption.
+        """
+        inputs = data.pop("inputs", data)
+        parameters = data.pop("parameters", {})
+        processed_image = self.processor(images=inputs, return_tensors="pt")
+        processed_image["pixel_values"] = processed_image["pixel_values"].to(device)
+        processed_image = {**processed_image, **parameters}
+        with torch.no_grad():
+            out = self.model.generate(
+                **processed_image
+            )
+        captions = self.processor.batch_decode(out, skip_special_tokens=True)
+        # postprocess the prediction
+        return [{"generated_text": captions}]