GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities Paper • 2406.11768 • Published Jun 17 • 20
SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities Paper • 2401.12168 • Published Jan 22 • 26