File size: 3,788 Bytes
e435868 70cfb5b e435868 70cfb5b e435868 70cfb5b e435868 70cfb5b e435868 70cfb5b e435868 70cfb5b e435868 70cfb5b e435868 70cfb5b e435868 70cfb5b e435868 70cfb5b e435868 70cfb5b e435868 0a25f4a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 |
---
dataset_info:
features:
- name: hazard_category
dtype: string
- name: hazard_subcategory
dtype: string
- name: hazard_subsubcategory
dtype: string
- name: case_id
dtype: string
- name: case_text
dtype: string
- name: unsafe_image_id
dtype: string
- name: unsafe_image_description
dtype: string
- name: prompt_text
dtype: string
- name: prompt_type
dtype: string
- name: unsafe_image_url
dtype: string
- name: unsafe_image_license
dtype: string
- name: unsafe_image_cw
dtype: string
splits:
- name: german
num_bytes: 70718
num_examples: 200
- name: russian
num_bytes: 76499
num_examples: 200
- name: chinese
num_bytes: 70778
num_examples: 200
- name: hindi
num_bytes: 84054
num_examples: 200
- name: spanish
num_bytes: 70689
num_examples: 200
- name: italian
num_bytes: 69545
num_examples: 200
- name: french
num_bytes: 73103
num_examples: 200
- name: english
num_bytes: 139996
num_examples: 400
- name: korean
num_bytes: 73217
num_examples: 200
- name: arabic
num_bytes: 71779
num_examples: 200
- name: farsi
num_bytes: 75732
num_examples: 200
download_size: 351210
dataset_size: 876110
configs:
- config_name: default
data_files:
- split: german
path: data/german-*
- split: russian
path: data/russian-*
- split: chinese
path: data/chinese-*
- split: hindi
path: data/hindi-*
- split: spanish
path: data/spanish-*
- split: italian
path: data/italian-*
- split: french
path: data/french-*
- split: english
path: data/english-*
- split: korean
path: data/korean-*
- split: arabic
path: data/arabic-*
- split: farsi
path: data/farsi-*
license: cc-by-4.0
language:
- ar
- fr
- en
- de
- zh
- ko
- fa
- hi
- it
- ru
- es
size_categories:
- 1K<n<10K
task_categories:
- image-text-to-text
---
# Dataset Card for the MSTS Benchmark
Here, you can find our [paper](https://huggingface.co/papers/2501.10057) and [code](https://github.com/paul-rottger/msts-multimodal-safety). Note that for reproducing the exact results, we refer the user to the GitHub repo that provides download and preprocessing scripts for the images.
Example usage:
```python
from datasets import load_dataset
ds = load_dataset("felfri/MSTS")
# or select specific language
lang = 'german'
ds = load_dataset("felfri/MSTS", split=lang)
```
## Disclaimer
The MSTS dataset **contains content that may be offensive or upsetting in nature**. Topics include, but are not limited to, **discriminatory language and discussions of abuse, violence, self-harm, exploitation, and other potentially upsetting subject matter**.
Please only engage with the data in accordance with your own personal risk tolerance. The data are intended for research purposes, especially research that can make models less harmful.
## Citation Information
Please consider citing our work if you use data and/or code from this repository.
```bibtex
@misc{röttger2025mstsmultimodalsafetytest,
title={MSTS: A Multimodal Safety Test Suite for Vision-Language Models},
author={Paul Röttger and Giuseppe Attanasio and Felix Friedrich and Janis Goldzycher and Alicia Parrish and Rishabh Bhardwaj and Chiara Di Bonaventura and Roman Eng and Gaia El Khoury Geagea and Sujata Goswami and Jieun Han and Dirk Hovy and Seogyeong Jeong and Paloma Jeretič and Flor Miriam Plaza-del-Arco and Donya Rooein and Patrick Schramowski and Anastassia Shaitarova and Xudong Shen and Richard Willats and Andrea Zugarini and Bertie Vidgen},
year={2025},
eprint={2501.10057},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2501.10057},
}
``` |