arxiv:2510.08492

Better Together: Leveraging Unpaired Multimodal Data for Stronger Unimodal Models

Published on Oct 9

· Submitted by

Niels Rogge on Oct 13

Massachusetts Institute of Technology

Upvote

Authors:

Abstract

UML, an unpaired multimodal learner, enhances representation learning in a target modality by leveraging auxiliary unpaired data from different modalities.

AI-generated summary

Traditional multimodal learners find unified representations for tasks like visual question answering, but rely heavily on paired datasets. However, an overlooked yet potentially powerful question is: can one leverage auxiliary unpaired multimodal data to directly enhance representation learning in a target modality? We introduce UML: Unpaired Multimodal Learner, a modality-agnostic training paradigm in which a single model alternately processes inputs from different modalities while sharing parameters across them. This design exploits the assumption that different modalities are projections of a shared underlying reality, allowing the model to benefit from cross-modal structure without requiring explicit pairs. Theoretically, under linear data-generating assumptions, we show that unpaired auxiliary data can yield representations strictly more informative about the data-generating process than unimodal training. Empirically, we show that using unpaired data from auxiliary modalities -- such as text, audio, or images -- consistently improves downstream performance across diverse unimodal targets such as image and audio. Our project page: https://unpaired-multimodal.github.io/

View arXiv page View PDF Project page GitHub 37 Add to collection

Community

nielsr

Paper submitter 3 days ago

Code: https://github.com/Sharut/Unpaired-Multimodal-Learning/

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2510.08492 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2510.08492 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2510.08492 in a Space README.md to link it from this page.