SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration Paper • 2411.10958 • Published 5 days ago • 33
LLaVA-o1: Let Vision Language Models Reason Step-by-Step Paper • 2411.10440 • Published 6 days ago • 87
JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation Paper • 2411.07975 • Published 9 days ago • 24
Lingma SWE-GPT: An Open Development-Process-Centric Language Model for Automated Software Improvement Paper • 2411.00622 • Published 20 days ago • 2
TableGPT2: A Large Multimodal Model with Tabular Data Integration Paper • 2411.02059 • Published 17 days ago • 5
From Word Vectors to Multimodal Embeddings: Techniques, Applications, and Future Directions For Large Language Models Paper • 2411.05036 • Published 15 days ago • 1
Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models Paper • 2411.04996 • Published 14 days ago • 48
HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems Paper • 2411.02959 • Published 17 days ago • 62
Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis Paper • 2411.01156 • Published 20 days ago • 4
MLLM as Retriever: Interactively Learning Multimodal Retrieval for Embodied Agents Paper • 2410.03450 • Published Oct 4 • 36
TableRAG: Million-Token Table Understanding with Language Models Paper • 2410.04739 • Published Oct 7 • 1
Horizon-Length Prediction: Advancing Fill-in-the-Middle Capabilities for Code Generation with Lookahead Planning Paper • 2410.03103 • Published Oct 4 • 6