vit-base-patch16-224-turkish-gpt2-medium
This vision encoder-decoder model utilizes the google/vit-base-patch16-224 as the encoder and ytu-ce-cosmos/turkish-gpt2-medium as the decoder, and it has been fine-tuned on the flickr8k-turkish dataset to generate image captions in Turkish.
Usage
import torch
from transformers import VisionEncoderDecoderModel, ViTImageProcessor, AutoTokenizer
from PIL import Image
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model_id = "atasoglu/vit-base-patch16-224-turkish-gpt2-medium"
img = Image.open("example.jpg")
feature_extractor = ViTImageProcessor.from_pretrained(model_id)
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = VisionEncoderDecoderModel.from_pretrained(model_id)
model.to(device)
features = feature_extractor(images=[img], return_tensors="pt")
pixel_values = features.pixel_values.to(device)
generated_captions = tokenizer.batch_decode(
model.generate(pixel_values, max_new_tokens=20),
skip_special_tokens=True,
)
print(generated_captions)
- Downloads last month
- 6
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for atasoglu/vit-base-patch16-224-turkish-gpt2-medium
Base model
google/vit-base-patch16-224