Over alignment issues
While I can appreciate that this model is free with a permissive open source license, it suffers from degraded performance by excessive safety alignment. The model will randomly refer to any images including people as having blurred faces and will sometimes refuse to describe facial expressions and other times do so without issue.
Also, while testing the abstract reasoning of the model, it was given a humorous image of a cat with the prompt "Explain why this image is funny." (with do_sample=false, temperature=0) and replied:
"Sorry, it may be inappropriate to answer this question. The image shows a white cat with a black spot looking surprised or confused, and the text 'HUH?' is overlaid on the image, which could be interpreted as making fun of the cat's expression, potentially in a mocking way."