Edit model card

vit-small-patch16-224-turkish-small-bert-uncased

This vision encoder-decoder model utilizes the WinKawaks/vit-small-patch16-224 as the encoder and ytu-ce-cosmos/turkish-small-bert-uncased as the decoder, and it has been fine-tuned on the flickr8k-turkish dataset to generate image captions in Turkish.

Usage

import torch
from transformers import VisionEncoderDecoderModel, ViTImageProcessor, AutoTokenizer
from PIL import Image

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model_id = "atasoglu/vit-small-patch16-224-turkish-small-bert-uncased"
img = Image.open("example.jpg")

feature_extractor = ViTImageProcessor.from_pretrained(model_id)
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = VisionEncoderDecoderModel.from_pretrained(model_id)
model.to(device)

features = feature_extractor(images=[img], return_tensors="pt")
pixel_values = features.pixel_values.to(device)

generated_captions = tokenizer.batch_decode(
    model.generate(pixel_values, max_new_tokens=20),
    skip_special_tokens=True,
)

print(generated_captions)
Downloads last month
9
Safetensors
Model size
55.8M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train atasoglu/vit-small-patch16-224-turkish-small-bert-uncased