metadata

dataset_info:
  features:
    - name: hazard_category
      dtype: string
    - name: hazard_subcategory
      dtype: string
    - name: hazard_subsubcategory
      dtype: string
    - name: case_id
      dtype: string
    - name: case_text
      dtype: string
    - name: unsafe_image_id
      dtype: string
    - name: unsafe_image_description
      dtype: string
    - name: prompt_text
      dtype: string
    - name: prompt_type
      dtype: string
    - name: unsafe_image_url
      dtype: string
    - name: unsafe_image_license
      dtype: string
    - name: unsafe_image_cw
      dtype: string
  splits:
    - name: german
      num_bytes: 70718
      num_examples: 200
    - name: russian
      num_bytes: 76499
      num_examples: 200
    - name: chinese
      num_bytes: 70778
      num_examples: 200
    - name: hindi
      num_bytes: 84054
      num_examples: 200
    - name: spanish
      num_bytes: 70689
      num_examples: 200
    - name: italian
      num_bytes: 69545
      num_examples: 200
    - name: french
      num_bytes: 73103
      num_examples: 200
    - name: english
      num_bytes: 139996
      num_examples: 400
    - name: korean
      num_bytes: 73217
      num_examples: 200
    - name: arabic
      num_bytes: 71779
      num_examples: 200
    - name: farsi
      num_bytes: 75732
      num_examples: 200
  download_size: 351210
  dataset_size: 876110
configs:
  - config_name: default
    data_files:
      - split: german
        path: data/german-*
      - split: russian
        path: data/russian-*
      - split: chinese
        path: data/chinese-*
      - split: hindi
        path: data/hindi-*
      - split: spanish
        path: data/spanish-*
      - split: italian
        path: data/italian-*
      - split: french
        path: data/french-*
      - split: english
        path: data/english-*
      - split: korean
        path: data/korean-*
      - split: arabic
        path: data/arabic-*
      - split: farsi
        path: data/farsi-*
license: cc-by-4.0
language:
  - ar
  - fr
  - en
  - de
  - zh
  - ko
  - fa
  - hi
  - it
  - ru
  - es
size_categories:
  - 1K<n<10K
task_categories:
  - image-text-to-text

Dataset Card for the MSTS Benchmark

Here, you can find our paper and code. Note that for reproducing the exact results, we refer the user to the GitHub repo that provides download and preprocessing scripts for the images.

Example usage:

from datasets import load_dataset

ds = load_dataset("felfri/MSTS")

# or select specific language
lang = 'german'
ds = load_dataset("felfri/MSTS", split=lang)

Disclaimer

The MSTS dataset contains content that may be offensive or upsetting in nature. Topics include, but are not limited to, discriminatory language and discussions of abuse, violence, self-harm, exploitation, and other potentially upsetting subject matter. Please only engage with the data in accordance with your own personal risk tolerance. The data are intended for research purposes, especially research that can make models less harmful.

Citation Information

Please consider citing our work if you use data and/or code from this repository.

@misc{röttger2025mstsmultimodalsafetytest,
      title={MSTS: A Multimodal Safety Test Suite for Vision-Language Models}, 
      author={Paul Röttger and Giuseppe Attanasio and Felix Friedrich and Janis Goldzycher and Alicia Parrish and Rishabh Bhardwaj and Chiara Di Bonaventura and Roman Eng and Gaia El Khoury Geagea and Sujata Goswami and Jieun Han and Dirk Hovy and Seogyeong Jeong and Paloma Jeretič and Flor Miriam Plaza-del-Arco and Donya Rooein and Patrick Schramowski and Anastassia Shaitarova and Xudong Shen and Richard Willats and Andrea Zugarini and Bertie Vidgen},
      year={2025},
      eprint={2501.10057},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2501.10057}, 
}