Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2412.10704

LinFusion: 1 GPU, 1 Minute, 16K Image

Paper • 2409.02097 • Published Sep 3 • 32
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion

Paper • 2409.11406 • Published Sep 17 • 25
Diffusion Models Are Real-Time Game Engines

Paper • 2408.14837 • Published Aug 27 • 121
Segment Anything with Multiple Modalities

Paper • 2408.09085 • Published Aug 17 • 21

iVideoGPT: Interactive VideoGPTs are Scalable World Models

Paper • 2405.15223 • Published May 24 • 12
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models

Paper • 2405.15574 • Published May 24 • 53
An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published May 27 • 86
Matryoshka Multimodal Models

Paper • 2405.17430 • Published May 27 • 31

VisDoM: Multi-Document QA with Visually Rich Elements Using Multimodal Retrieval-Augmented Generation

Paper • 2412.10704 • Published 9 days ago • 14

VisDoM: Multi-Document QA with Visually Rich Elements Using Multimodal Retrieval-Augmented Generation

Paper • 2412.10704 • Published 9 days ago • 14
How to Synthesize Text Data without Model Collapse?

Paper • 2412.14689 • Published 4 days ago • 38

Multi doc visuals

VisDoM: Multi-Document QA with Visually Rich Elements Using Multimodal Retrieval-Augmented Generation

Paper • 2412.10704 • Published 9 days ago • 14

VisDoM: Multi-Document QA with Visually Rich Elements Using Multimodal Retrieval-Augmented Generation

Paper • 2412.10704 • Published 9 days ago • 14

about 9 hours ago

WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning

Paper • 2411.02337 • Published Nov 4 • 35
Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models

Paper • 2411.04996 • Published Nov 7 • 49
Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level

Paper • 2411.03562 • Published Nov 5 • 60
StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization

Paper • 2410.08815 • Published Oct 11 • 43

HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems

Paper • 2411.02959 • Published Nov 5 • 64
"Give Me BF16 or Give Me Death"? Accuracy-Performance Trade-Offs in LLM Quantization

Paper • 2411.02355 • Published Nov 4 • 46
CORAL: Benchmarking Multi-turn Conversational Retrieval-Augmentation Generation

Paper • 2410.23090 • Published Oct 30 • 54
RARe: Retrieval Augmented Retrieval with In-Context Examples

Paper • 2410.20088 • Published Oct 26 • 5

Fact, Fetch, and Reason: A Unified Evaluation of Retrieval-Augmented Generation

Paper • 2409.12941 • Published Sep 19 • 23
LLM Teacher-Student Framework for Text Classification With No Manually Annotated Data: A Case Study in IPTC News Topic Classification

Paper • 2411.19638 • Published 24 days ago • 6
OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation

Paper • 2412.02592 • Published 20 days ago • 20
VisDoM: Multi-Document QA with Visually Rich Elements Using Multimodal Retrieval-Augmented Generation

Paper • 2412.10704 • Published 9 days ago • 14

Document Parsing Unveiled: Techniques, Challenges, and Prospects for Structured Information Extraction

Paper • 2410.21169 • Published Oct 28 • 30
LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture

Paper • 2409.02889 • Published Sep 4 • 55
M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding

Paper • 2411.04952 • Published Nov 7 • 28
Contextual Document Embeddings

Paper • 2410.02525 • Published Oct 3 • 18

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs