Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
1
AZA
Zillionairs
Follow
0 followers
·
10 following
abraham_airs
AI & ML interests
None yet
Recent Activity
liked
a model
8 days ago
Qwen/Qwen2.5-Coder-32B-Instruct
replied
to
singhsidhukuldeep
's
post
8 days ago
Exciting breakthrough in Document AI! Researchers from UNC Chapel Hill and Bloomberg have developed M3DocRAG, a revolutionary framework for multi-modal document understanding. The innovation lies in its ability to handle complex document scenarios that traditional systems struggle with: - Process 40,000+ pages across 3,000+ documents - Answer questions requiring information from multiple pages - Understand visual elements like charts, tables, and figures - Support both closed-domain (single document) and open-domain (multiple documents) queries Under the hood, M3DocRAG operates through three sophisticated stages: >> Document Embedding: - Converts PDF pages to RGB images - Uses ColPali to project both text queries and page images into a shared embedding space - Creates dense visual embeddings for each page while maintaining visual information integrity >> Page Retrieval: - Employs MaxSim scoring to compute relevance between queries and pages - Implements inverted file indexing (IVFFlat) for efficient search - Reduces retrieval latency from 20s to under 2s when searching 40K+ pages - Supports approximate nearest neighbor search via Faiss >> Question Answering: - Leverages Qwen2-VL 7B as the multi-modal language model - Processes retrieved pages through a visual encoder - Generates answers considering both textual and visual context The results are impressive: - State-of-the-art performance on MP-DocVQA benchmark - Superior handling of non-text evidence compared to text-only systems - Significantly better performance on multi-hop reasoning tasks This is a game-changer for industries dealing with large document volumes—finance, healthcare, and legal sectors can now process documents more efficiently while preserving crucial visual context.
View all activity
Organizations
None yet
Zillionairs
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
liked
a model
8 days ago
Qwen/Qwen2.5-Coder-32B-Instruct
Text Generation
•
Updated
24 days ago
•
294k
•
•
1.28k