brainblip / README.md
braintacles's picture
Update README.md
ecae096 verified
|
raw
history blame
1.84 kB
metadata
library_name: transformers
license: bsd-3-clause
pipeline_tag: image-to-text

BrainBLIP

This model is not ready for production use and is in preliminary stages of training. Use at your own risks

Model Details

Model Description

BrainBLIP is finetuned to give more natural captions for training text-to-image datasets with an emphasis on natural language while adding a minimal amount of tags for context. This model also introduces "movie rating" tags similar to what CivitAI has implemented:

  • PG_RATING
  • PG13_RATING
  • R_RATING
  • X_RATING
  • XXX_RATING

The model needs a lot more data so these tags are not quite consistent yet.

How to Get Started with the Model

from transformers import AutoProcessor, BlipForConditionalGeneration
from PIL import Image

processor = AutoProcessor.from_pretrained("Salesforce/blip-image-captioning-base")
model = BlipForConditionalGeneration.from_pretrained("braintacles/brainblip").to("cuda")

image_path_or_url = r"https://imagePath_or_url.jpg"
raw_image = Image.open(requests.get(image_path_or_url, stream=True).raw) if image_path_or_url.startswith("http") else Image.open(image_path_or_url)

inputs = processor(raw_image, return_tensors="pt").to("cuda")
out = model.generate(**inputs, min_length=40, max_new_tokens=75, num_beams=5, repetition_penalty=1.40)
caption = processor.decode(out[0], skip_special_tokens=True)
print(caption)

Training Details

Training Data

All captions for this data have been written by myself by hand with some occasional help from GPT4. Very special thanks to the following people who also have contributed a huge amount of time hand captioning some data: