--- base_model: Qwen/Qwen2.5-VL-3B-Instruct datasets: - davanstrien/iconclass-vlm-sft - biglam/brill_iconclass library_name: transformers model_name: iconclass-vlm tags: - generated_from_trainer - hf_jobs - sft - trl - vision-language - iconclass - cultural-heritage - art-classification license: apache-2.0 pipeline_tag: image-text-to-text --- # Model Card for iconclass-vlm This model is a fine-tuned version of [Qwen/Qwen2.5-VL-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct) on the [davanstrien/iconclass-vlm-sft](https://huggingface.co/datasets/davanstrien/iconclass-vlm-sft) dataset. You can explore the predictions of this model using this [Space](https://huggingface.co/spaces/davanstrien/iconclass-predictions). **Note:** this model is a work in progress with the goal to see how far small models can be created to excel at this kind of specific but challenging task. As a result the base model used may change over time. ## Model Description This vision-language model has been fine-tuned to generate [Iconclass](https://iconclass.org/) classification codes from images. Iconclass is a comprehensive classification system for describing the content of images, particularly used in cultural heritage and art history contexts. The model was trained using Supervised Fine-Tuning (SFT) with [TRL](https://github.com/huggingface/trl) on a reformatted version of the Brill Iconclass AI Test Set, which contains 87,744 images with expert-assigned Iconclass labels. ## Intended Use - **Primary use case**: Automatic classification of art and cultural heritage images using Iconclass notation - **Users**: Digital humanities researchers, museum professionals, art historians, and developers working with cultural heritage collections ## Quick Start ### Simple Pipeline Approach ```python from transformers import pipeline from PIL import Image # Load pipeline pipe = pipeline("image-text-to-text", model="davanstrien/iconclass-vlm") # Load your image image = Image.open("your_artwork.jpg") # Prepare messages messages = [ { "role": "user", "content": [ {"type": "image", "image": image}, {"type": "text", "text": "Generate Iconclass labels for this image"} ] } ] # Generate with beam search for better results output = pipe(messages, max_new_tokens=800, num_beams=4) print(output[0]["generated_text"]) ``` ### Alternative Approach with AutoModel ```python from transformers import AutoProcessor, AutoModelForVision2Seq from PIL import Image model_name = "davanstrien/iconclass-vlm" processor = AutoProcessor.from_pretrained(model_name) model = AutoModelForVision2Seq.from_pretrained(model_name) # Load your image image = Image.open("your_artwork.jpg") # Prepare inputs messages = [ { "role": "user", "content": [ {"type": "image"}, {"type": "text", "text": "Generate Iconclass labels for this image"} ] } ] # Process and generate inputs = processor(messages, images=[image], return_tensors="pt") outputs = model.generate(**inputs, max_new_tokens=800, num_beams=4) response = processor.decode(outputs[0], skip_special_tokens=True) print(response) ``` ### Training Dataset The model was trained on a reformatted version of the Brill Iconclass AI Test Set [biglam/brill_iconclass](https://huggingface.co/datasets/biglam/brill_iconclass). The dataset was reformatted into a message format suitable for SFT training. ### Training Procedure This model was trained with SFT (Supervised Fine-Tuning). Framework Versions ``` TRL: 0.22.1 Transformers: 4.55.2 PyTorch: 2.8.0 Datasets: 4.0.0 Tokenizers: 0.21.4 ``` ### Limitations and Biases The Iconclass classification system reflects biases from its creation period (1940s Netherlands). Certain categories, particularly those related to human classification, may contain outdated or problematic terminology. Model performance may vary on images outside the Western art tradition due to the dataset composition. ### Citations Training framework ```bibtex @misc{vonwerra2022trl, title = {{TRL: Transformer Reinforcement Learning}}, author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec}, year = 2020, journal = {GitHub repository}, publisher = {GitHub}, howpublished = {\url{https://github.com/huggingface/trl}} } ``` Dataset ```bibtex @misc{iconclass, title = {Brill Iconclass AI Test Set}, author = {Etienne Posthumus}, year = {2020} } ```