VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement Paper โข 2411.15115 โข Published 2 days ago โข 1
M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding Paper โข 2411.04952 โข Published 17 days ago โข 27 โข 3
Diagnostic Benchmark and Iterative Inpainting for Layout-Guided Image Generation Paper โข 2304.06671 โข Published Apr 13, 2023
DataEnvGym: Data Generation Agents in Teacher Environments with Student Feedback Paper โข 2410.06215 โข Published Oct 8
Unifying Vision-and-Language Tasks via Text Generation Paper โข 2102.02779 โข Published Feb 4, 2021
Self-Chained Image-Language Model for Video Localization and Question Answering Paper โข 2305.06988 โข Published May 11, 2023
DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generative Models Paper โข 2202.04053 โข Published Feb 8, 2022
Visual Programming for Text-to-Image Generation and Evaluation Paper โข 2305.15328 โข Published May 24, 2023