---
base_model: Qwen/Qwen2.5-VL-3B-Instruct
datasets:
- davanstrien/iconclass-vlm-sft
- biglam/brill_iconclass
library_name: transformers
model_name: iconclass-vlm
tags:
- generated_from_trainer
- hf_jobs
- sft
- trl
- vision-language
- iconclass
- cultural-heritage
- art-classification
license: apache-2.0
pipeline_tag: image-text-to-text
---

# Model Card for iconclass-vlm

This model is a fine-tuned version of [Qwen/Qwen2.5-VL-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct) on the [davanstrien/iconclass-vlm-sft](https://huggingface.co/datasets/davanstrien/iconclass-vlm-sft) dataset.

You can explore the predictions of this model using this [Space](https://huggingface.co/spaces/davanstrien/iconclass-predictions).

**Note:** this model is a work in progress with the goal to see how far small models can be created to excel at this kind of specific but challenging task. As a result the base model used may change over time. 

## Model Description

This vision-language model has been fine-tuned to generate [Iconclass](https://iconclass.org/) classification codes from images. Iconclass is a comprehensive classification system for describing the content of images, particularly used in cultural heritage and art history contexts.

The model was trained using Supervised Fine-Tuning (SFT) with [TRL](https://github.com/huggingface/trl) on a reformatted version of the Brill Iconclass AI Test Set, which contains 87,744 images with expert-assigned Iconclass labels.

## Intended Use

- **Primary use case**: Automatic classification of art and cultural heritage images using Iconclass notation
- **Users**: Digital humanities researchers, museum professionals, art historians, and developers working with cultural heritage collections

## Quick Start

### Simple Pipeline Approach

```python
from transformers import pipeline
from PIL import Image

# Load pipeline
pipe = pipeline("image-text-to-text", model="davanstrien/iconclass-vlm")

# Load your image
image = Image.open("your_artwork.jpg")

# Prepare messages
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": image},
            {"type": "text", "text": "Generate Iconclass labels for this image"}
        ]
    }
]

# Generate with beam search for better results
output = pipe(messages, max_new_tokens=800, num_beams=4)
print(output[0]["generated_text"])
```

### Alternative Approach with AutoModel

```python
from transformers import AutoProcessor, AutoModelForVision2Seq
from PIL import Image

model_name = "davanstrien/iconclass-vlm"
processor = AutoProcessor.from_pretrained(model_name)
model = AutoModelForVision2Seq.from_pretrained(model_name)

# Load your image
image = Image.open("your_artwork.jpg")

# Prepare inputs
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image"},
            {"type": "text", "text": "Generate Iconclass labels for this image"}
        ]
    }
]

# Process and generate
inputs = processor(messages, images=[image], return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=800, num_beams=4)
response = processor.decode(outputs[0], skip_special_tokens=True)
print(response)
```

### Training Dataset
The model was trained on a reformatted version of the Brill Iconclass AI Test Set [biglam/brill_iconclass](https://huggingface.co/datasets/biglam/brill_iconclass). 

The dataset was reformatted into a message format suitable for SFT training.


### Training Procedure

This model was trained with SFT (Supervised Fine-Tuning).

Framework Versions
```
TRL: 0.22.1
Transformers: 4.55.2
PyTorch: 2.8.0
Datasets: 4.0.0
Tokenizers: 0.21.4
```

### Limitations and Biases

The Iconclass classification system reflects biases from its creation period (1940s Netherlands).
Certain categories, particularly those related to human classification, may contain outdated or problematic terminology.
Model performance may vary on images outside the Western art tradition due to the dataset composition.

### Citations

Training framework

```bibtex
@misc{vonwerra2022trl,
    title        = {{TRL: Transformer Reinforcement Learning}},
    author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
    year         = 2020,
    journal      = {GitHub repository},
    publisher    = {GitHub},
    howpublished = {\url{https://github.com/huggingface/trl}}
}
```

Dataset

```bibtex
@misc{iconclass,
    title = {Brill Iconclass AI Test Set},
    author = {Etienne Posthumus},
    year = {2020}
}
```