arxiv:2412.03555

PaliGemma 2: A Family of Versatile VLMs for Transfer

Published on Dec 4

· Submitted by

osanseviero on Dec 5

#1 Paper of the day

Upvote

109

Authors:

Michael Tschannen ,

Daniel Keysers ,

Yonatan Bitton ,

Alexey Gritsenko ,

Matthias Minderer ,

Anthony Sherbondy ,

Shangbang Long ,

Emanuele Bugliarello ,

Thomas Mesnard ,

Ibrahim Alabdulmohsin ,

Lucas Beyer ,

Xiaohua Zhai

Abstract

PaliGemma 2 is an upgrade of the PaliGemma open Vision-Language Model (VLM) based on the Gemma 2 family of language models. We combine the SigLIP-So400m vision encoder that was also used by PaliGemma with the whole range of Gemma 2 models, from the 2B one all the way up to the 27B model. We train these models at three resolutions (224px, 448px, and 896px) in multiple stages to equip them with broad knowledge for transfer via fine-tuning. The resulting family of base models covering different model sizes and resolutions allows us to investigate factors impacting transfer performance (such as learning rate) and to analyze the interplay between the type of task, model size, and resolution. We further increase the number and breadth of transfer tasks beyond the scope of PaliGemma including different OCR-related tasks such as table structure recognition, molecular structure recognition, music score recognition, as well as long fine-grained captioning and radiography report generation, on which PaliGemma 2 obtains state-of-the-art results.

View arXiv page View PDF Add to collection

Community

osanseviero

Paper submitter 7 days ago

•

edited 6 days ago

PaliGemma 2 paper is here!

nbroad

6 days ago

@osanseviero , when will the models be uploaded?

merve

6 days ago

@nbroad they're uploaded and linked to this paper page

nbroad

6 days ago

@merve , I swear that wasn't there when I asked 😅

CraigCode78

6 days ago

This comment has been hidden

MoritzLaurer

6 days ago

•

edited 6 days ago

Are there no -mix models trained on a mixture of tasks as part of this release, like with PaliGemma1? These were the most popular variants of PaliGemma1