Florence-2-base-PromptGen v1.5 (with config and code updates)

This is MiaoshouAI/Florence-2-base-PromptGen-v1.5 which retains its existing features, but with changes to supporting configuration and code to ensure drop-in replacement for Microsoft Florence-2 Model Base the when using Transformers library in Python.

The config.json has been updated with an auto_map property and key-values added matching the Florence-2-base resolving this issue.
Python code that is located in the root of the repo in Florence-2-base but in florence2_base_ft in MiaoshouAI/Florence-2-base-PromptGen-v1.5 has been moved to the root of the repo as this prevented trust_remove_code=True in the AutoProcessor.from_pretrained from loading the code.
Changes to Florence2-base's modeling_florence2.py to ensure that the class Florence2LanguageForConditionalGeneration inherits from GenerationMixin, secondary to PreTrainedModel to ensure compatibility with transformers from v4.50 onwards .

About PromptGen

Florence-2-base-PromptGen is a model trained by MiaoshouAI that specializes in generating highly descriptive prompts and tags that assist with training image generation models like FLUX.1-dev and creating descriptive prompts for image generation.

Supported prompts include standard prompts from Florence2-base such as <0D> for identifying object locations and enhanced prompts by MiaoshouAI including , , and additional prompts included and . See the original repo for more details.

How to use:

To use this model, you can load it directly from the Hugging Face Model Hub.

To run it as an API Server, either on Windows or Linux, with command line clients (including fast captioning of all images in folders) you can use Florence2 Vision API Server .

First, install dependancies (in a virtual environent if you prefer), for example:

pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cu124
pip3 install transformers pillow einops timm

The following code is based on the microsoft/Florence2-base example but with updated prompt and model, and correct imports.

import requests
import torch
from PIL import Image
from transformers import AutoProcessor, AutoModelForCausalLM 


device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32

model = AutoModelForCausalLM.from_pretrained("createveai/Florence-2-base-PromptGen-v1.5", torch_dtype=torch_dtype, trust_remote_code=True).to(device)
processor = AutoProcessor.from_pretrained("createveai/Florence-2-base-PromptGen-v1.5", trust_remote_code=True)

# Examples include CAPTION>, <DETAILED_CAPTION>, <MORE_DETAILED_CAPTION>,<GENERATE_TAGS>, <MIXED_CAPTION>, <0D>
prompt = "<CAPTION>"

url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg?download=true"
image = Image.open(requests.get(url, stream=True).raw)

inputs = processor(text=prompt, images=image, return_tensors="pt").to(device, torch_dtype)

generated_ids = model.generate(
    input_ids=inputs["input_ids"],
    pixel_values=inputs["pixel_values"],
    max_new_tokens=1024,
    num_beams=3,
    do_sample=False
)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=False)[0]
parsed_answer = processor.post_process_generation(generated_text, task=prompt, image_size=(image.width, image.height))

print(parsed_answer[prompt])

createveai
/

Florence-2-base-PromptGen-v1.5

Florence-2-base-PromptGen v1.5 (with config and code updates)

About PromptGen

How to use:

Model tree for createveai/Florence-2-base-PromptGen-v1.5