InterleavedBench (EMNLP'24 Main Conference)
This is the official huggingface repo for the paper "Holistic Evaluation for Interleaved Text-and-Image Generation" accepted in EMNLP 2024 Main Conference.
Paper: https://arxiv.org/abs/2406.14643
Website: https://vt-nlp.github.io/InterleavedEval/
How to use InterleavedBench
Repo hierarchy
interleaved_bench.json
is the main json file of the dataset.zipped_images
is the directory of zipped images for each subset, including the images for the context and ground truths.src/interleavedeval_gpt4o.py
is the python script for InterleavedEval with GPT-4o. Its input is the model prediction file.
To get started
- unzip the images files under
zipped_images
. - Run the inference on
interleaved_bench.json
with your model and get your model output (including text and image). - Use the script in
src/interleavedeval_gpt4o.py
to perform evaluation.
Important notes
- For image editing and subject-driven generation tasks, the scores on text-related aspects (text quality, text-image coherence) are directly set to 0. Please skip those scores when you compute the overall performance.
One example in interleaved_bench.json
is as follows:
{
"id": "wikihow_next_step_0_489157",
"image": [
"wiki_images_test/489157_0_0.png",
"wiki_images_test/489157_0_1.png",
"wiki_images_test/489157_0_2.png",
"wiki_images_test/489157_0_3.png",
"wiki_images_test/489157_0_4.png"
],
"task_name": "wikihow_next_step",
"conversations": [
{
"from": "human",
"value": "In this task, you are given a high-level goal 'How to Make a Banana Shake': Banana shakes are a tasty way to get a lot of nutrients all at once. Bananas provide a creamy, smooth texture when turned into a drink. Bananas also fill empty stomachs, staving off hunger pangs and giving you a nice energy burst. In this article you'll find a few ways to make banana shakes, among the many possibilities. \n You need to assist human user to complete this task via making a banana shake with kefir. Given the previous steps, you need to predict the subsequent 4 steps to help the user to finish the task. The previous steps are: \n <BEGIN> Put 2 to 3 bananas in a bowl. <image>\n"
},
{
"from": "gpt",
"value": "Now put in a liter of kefir and a teaspoon of sugar. <image>\n Put 1 cup milk into the mix. <image>\n Using a blender, blend all ingredients together. <image>\n Relax with your fresh banana smoothie! <image>\n"
}
],
"goal": "How to Make a Banana Shake",
"category": [
"Food and Entertaining",
"Drinks",
"Smoothies Shakes and Milk",
"Fruit Based Shakes"
],
"dataset_id": "wikihow_selected_test_uni"
},
Reference
If you find our work useful or interesting, please cite:
@article{liu_holistic_2024,
author = {Minqian Liu and
Zhiyang Xu and
Zihao Lin and
Trevor Ashby and
Joy Rimchala and
Jiaxin Zhang and
Lifu Huang},
title = {Holistic Evaluation for Interleaved Text-and-Image Generation},
journal = {CoRR},
volume = {abs/2406.14643},
year = {2024},
url = {https://doi.org/10.48550/arXiv.2406.14643},
doi = {10.48550/ARXIV.2406.14643},
eprinttype = {arXiv},
eprint = {2406.14643},
timestamp = {Tue, 16 Jul 2024 16:17:50 +0200}
}