LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding Paper • 2410.17434 • Published 18 days ago • 24
Meta-Chunking: Learning Efficient Text Segmentation via Logical Perception Paper • 2410.12788 • Published 24 days ago • 21
AutoTrain: No-code training for state-of-the-art models Paper • 2410.15735 • Published 20 days ago • 55
SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree Paper • 2410.16268 • Published 19 days ago • 65
Improve Vision Language Model Chain-of-thought Reasoning Paper • 2410.16198 • Published 19 days ago • 17
LLM-based Optimization of Compound AI Systems: A Survey Paper • 2410.16392 • Published 19 days ago • 13
xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs Paper • 2410.16267 • Published 19 days ago • 14
Mini-Omni2: Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities Paper • 2410.11190 • Published 26 days ago • 20
MedVisionLlama: Leveraging Pre-Trained Large Language Model Layers to Enhance Medical Image Segmentation Paper • 2410.02458 • Published Oct 3 • 9
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models Paper • 2409.17146 • Published Sep 25 • 99
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions Paper • 2409.18042 • Published Sep 26 • 36
MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models Paper • 2409.17481 • Published Sep 26 • 46
A Preliminary Study of o1 in Medicine: Are We Closer to an AI Doctor? Paper • 2409.15277 • Published Sep 23 • 34
MURI: High-Quality Instruction Tuning Datasets for Low-Resource Languages via Reverse Instructions Paper • 2409.12958 • Published Sep 19 • 7
MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines Paper • 2409.12959 • Published Sep 19 • 36