File size: 730 Bytes
8c868ea ad4b94b 8c868ea 321898c 8c868ea 619777f 8c868ea d480281 8c868ea bf0b1a7 8c868ea |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
---
language:
- fa
library_name: hezar
tags:
- image-to-text
- hezar
metrics:
- wer
pipeline_tag: image-to-text
datasets:
- hezarai/flickr30k-fa
---
A Persian image captioning model constructed from a ViT + GPT2 architecture trained on [flickr30k-fa](https://www.kaggle.com/datasets/sajjadayobi360/flickrfa) (created by Sajjad Ayoubi).
The encoder (ViT) was initialized from https://huggingface.co/google/vit-base-patch16-224 and the decoder (GPT2) was initialized
from https://huggingface.co/HooshvareLab/gpt2-fa .
## Usage
```
pip install hezar
```
```python
from hezar.models import Model
model = Model.load("hezarai/vit-gpt2-fa-image-captioning-flickr30k")
captions = model.predict("example_image.jpg")
print(captions)
``` |