Edit model card
YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)
# Image-to-Poem Generator

This project uses a pre-trained model to generate poems based on input images. It leverages the Hugging Face Transformers library and a custom-trained model to create poetic descriptions of visual content.

## Table of Contents

1. [Installation](#installation)
2. [Usage](#usage)
3. [Model Information](#model-information)
4. [Function Description](#function-description)
5. [Example](#example)
6. [Requirements](#requirements)
7. [License](#license)

## Installation

To use this image-to-poem generator, you need to install the required libraries. You can do this using pip:

Usage

  1. First, import the necessary modules and load the pre-trained model:
from transformers import AutoProcessor, AutoModelForCausalLM
from PIL import Image

processor = AutoProcessor.from_pretrained("Sourabh2/git-base-poem")
model = AutoModelForCausalLM.from_pretrained("Sourabh2/git-base-poem")
  1. Define the generate_caption function:
def generate_caption(image_path):
    image = Image.open(image_path)
    inputs = processor(images=image, return_tensors="pt")
    pixel_values = inputs.pixel_values
    generated_ids = model.generate(pixel_values=pixel_values, max_length=50)
    generated_caption = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
    return generated_caption
  1. Use the function to generate a poem from an image:
image_path = "/path/to/your/image.jpg"
output = generate_caption(image_path)
print(output)

Model Information

This project uses the "Sourabh2/git-base-poem" model, which is a fine-tuned version of the GIT (Generative Image-to-text Transformer) model. It has been specifically trained to generate poetic descriptions of images.

Function Description

The generate_caption function takes an image file path as input and returns a generated poem. Here's what it does:

  1. Opens the image file using PIL (Python Imaging Library).
  2. Processes the image using the pre-trained processor.
  3. Generates a poetic caption using the pre-trained model.
  4. Decodes the generated output and returns it as a string.

Example

image_path = "/content/12330616_72ed8075fa.jpg"
output = generate_caption(image_path)
print(output)

This will print the generated poem based on the content of the image at the specified path.

Requirements

  • Python 3.6+
  • transformers library
  • Pillow (PIL) library
Downloads last month
3
Safetensors
Model size
394M params
Tensor type
F32
·
Inference API
Unable to determine this model's library. Check the docs .