|
--- |
|
library_name: transformers |
|
license: bsd-3-clause |
|
pipeline_tag: image-to-text |
|
--- |
|
|
|
# BrainBLIP |
|
**This model is not ready for production use and is in preliminary stages of training. Use at your own risks** |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
BrainBLIP is finetuned to give more natural captions for training text-to-image datasets with an emphasis on natural language while adding a minimal amount of tags for context. |
|
This model also introduces "movie rating" tags similar to [what CivitAI has implemented](https://education.civitai.com/civitais-guide-to-content-levels/): |
|
- PG_RATING |
|
- PG13_RATING |
|
- R_RATING |
|
- X_RATING |
|
- XXX_RATING |
|
|
|
The model needs a lot more data so these tags are not quite consistent yet. |
|
|
|
## How to Get Started with the Model |
|
|
|
```py |
|
from transformers import AutoProcessor, BlipForConditionalGeneration |
|
from PIL import Image |
|
|
|
processor = AutoProcessor.from_pretrained("Salesforce/blip-image-captioning-base") |
|
model = BlipForConditionalGeneration.from_pretrained("braintacles/brainblip").to("cuda") |
|
|
|
image_path_or_url = r"https://imagePath_or_url.jpg" |
|
raw_image = Image.open(requests.get(image_path_or_url, stream=True).raw) if image_path_or_url.startswith("http") else Image.open(image_path_or_url) |
|
|
|
inputs = processor(raw_image, return_tensors="pt").to("cuda") |
|
out = model.generate(**inputs, min_length=40, max_new_tokens=75, num_beams=5, repetition_penalty=1.40) |
|
caption = processor.decode(out[0], skip_special_tokens=True) |
|
print(caption) |
|
``` |
|
|
|
## Training Details |
|
|
|
### Training Data |
|
|
|
All captions for this data have been written by myself by hand with some occasional help from GPT4. |
|
Very special thanks to the following people who also have contributed a huge amount of time hand captioning some data: |
|
- [Temporarium](https://civitai.com/user/Temporarium) |
|
- [HailoKnight](https://civitai.com/user/HailoKnight) |