Flo Schneider's picture

Flo Schneider

floschne

·

https://www.inf.uni-hamburg.de/en/inst/ab/lt/people/florian-schneider.html

AI & ML interests

Large Vision-Language Models, Cross-modal Retrieval

Recent Activity

new activity about 2 months ago

floschne/m5b_vlod:Add language metadata

new activity about 2 months ago

floschne/m5b_vgr:Add language metadata

liked a dataset 3 months ago

floschne/gimmick-civqa

View all activity

Organizations

upvoted 2 articles 7 months ago

Article

Train 400x faster Static Embedding Models with Sentence Transformers

Jan 15, 2025

•

222

Article

🪆 Introduction to Matryoshka Embedding Models

+1

Feb 23, 2024

•

191

upvoted a collection 7 months ago

GIMMICK

Datasets of the GIMMICK Benchmark • 3 items • Updated Jun 20, 2025 • 1

upvoted an article 8 months ago

Article

The Transformers Library: standardizing model definitions

+2

May 15, 2025

•

121

upvoted a paper 8 months ago

Emerging Properties in Unified Multimodal Pretraining

Paper • 2505.14683 • Published May 20, 2025 • 133

upvoted 5 papers 11 months ago

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

Paper • 2502.14786 • Published Feb 20, 2025 • 157

MVL-SIB: A Massively Multilingual Vision-Language Benchmark for Cross-Modal Topical Matching

Paper • 2502.12852 • Published Feb 18, 2025 • 3

Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published Feb 19, 2025 • 212

GIMMICK -- Globally Inclusive Multimodal Multitask Cultural Knowledge Benchmarking

Paper • 2502.13766 • Published Feb 19, 2025 • 3

How Much Do LLMs Hallucinate across Languages? On Multilingual Estimation of LLM Hallucination in the Wild

Paper • 2502.12769 • Published Feb 18, 2025 • 3

upvoted a collection 12 months ago

Qwen2.5-VL

Vision-language model series based on Qwen2.5 • 11 items • Updated 18 days ago • 552

upvoted a collection about 1 year ago

Centurio

Artifacts of the paper "Centurio: On Drivers of Multilingual Ability of Large Vision-Language Model" • 6 items • Updated Feb 4, 2025 • 4

upvoted 2 papers about 1 year ago

Centurio: On Drivers of Multilingual Ability of Large Vision-Language Model

Paper • 2501.05122 • Published Jan 9, 2025 • 19

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

Paper • 2412.05271 • Published Dec 6, 2024 • 159

upvoted a collection about 1 year ago

Qwen2-VL

Vision-language model series based on Qwen2 • 16 items • Updated 18 days ago • 227

upvoted 3 papers about 1 year ago

LLaVA-UHD v2: an MLLM Integrating High-Resolution Feature Pyramid via Hierarchical Window Transformer

Paper • 2412.13871 • Published Dec 18, 2024 • 18

Qwen2.5 Technical Report

Paper • 2412.15115 • Published Dec 19, 2024 • 376

Progressive Multimodal Reasoning via Active Retrieval

Paper • 2412.14835 • Published Dec 19, 2024 • 73

upvoted a paper over 1 year ago

Aria: An Open Multimodal Native Mixture-of-Experts Model

Paper • 2410.05993 • Published Oct 8, 2024 • 111

upvoted a collection over 1 year ago

LLaVA-Onevision

LLaVa_Onevision models for single-image, multi-image, and video scenarios • 9 items • Updated Sep 18, 2024 • 16