Visor - Natural language Anime Tagging

Visor is a natural-language-based image tagging model based on the BLIP model architecture.

Potential Use cases can be to caption anime images for training diffusion models

Downloads last month
31
Safetensors
Model size
470M params
Tensor type
BF16
ยท
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Space using shadowlilac/visor 1