This is OpenAI's DALL-E dVAE with it's encoder & decoder wrapped into a single module.
The model was uploaded to be used with multi-modal-tokenizers.
Multi-Modal Tokenizers
Multi-modal tokenizers for more than just text. This package provides tools for tokenizing and decoding images and mixed-modal inputs (text and images) using encoders like DALL-E's VAE.
Installation
To install the package, clone the repository and use pip to install it:
git clone https://github.com/anothy1/multi-modal-tokenizers
pip install ./multi-modal-tokenizers
Or from PyPI:
pip install multi-modal-tokenizers
Usage
Example: Using DalleTokenizer
Below is an example script demonstrating how to use the DalleTokenizer
to encode and decode images.
import requests
import PIL
import io
from multi_modal_tokenizers import DalleTokenizer, MixedModalTokenizer
from IPython.display import display
def download_image(url):
resp = requests.get(url)
resp.raise_for_status()
return PIL.Image.open(io.BytesIO(resp.content))
# Download an image
img = download_image('https://assets.bwbx.io/images/users/iqjWHBFdfxIU/iKIWgaiJUtss/v2/1000x-1.jpg')
# Load the DalleTokenizer from Hugging Face repository
image_tokenizer = DalleTokenizer.from_hf("anothy1/dalle-tokenizer")
# Encode the image
tokens = image_tokenizer.encode(img)
print("Encoded tokens:", tokens)
# Decode the tokens back to an image
reconstructed = image_tokenizer.decode(tokens)
# Display the reconstructed image
display(reconstructed)
Example: Using MixedModalTokenizer
The package also provides MixedModalTokenizer
for tokenizing and decoding mixed-modal inputs (text and images).
from transformers import AutoTokenizer
from multi_modal_tokenizers import MixedModalTokenizer
from PIL import Image
# Load a pretrained text tokenizer from Hugging Face
text_tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
# Create a MixedModalTokenizer
mixed_tokenizer = MixedModalTokenizer(
text_tokenizer=text_tokenizer,
image_tokenizer=image_tokenizer,
device="cpu"
)
# Example usage
text = "This is an example with <new_image> in the middle."
img_path = "path/to/your/image.jpg"
image = Image.open(img_path)
# Encode the text and image
encoded = mixed_tokenizer.encode(text=text, images=[image])
print("Encoded mixed-modal tokens:", encoded)
# Decode the sequence back to text and image
decoded_text, decoded_images = mixed_tokenizer.decode(encoded)
print("Decoded text:", decoded_text)
for idx, img in enumerate(decoded_images):
img.save(f"decoded_image_{idx}.png")
- Downloads last month
- 4