LLaVA-o1: Let Vision Language Models Reason Step-by-Step Paper β’ 2411.10440 β’ Published 10 days ago β’ 99
CLAP: Contrastive Language-Audio Pretraining Collection CLAP is to audio what CLIP is to image. β’ 5 items β’ Updated Oct 31, 2023 β’ 8