ModelOne-Vision
π Overview
ModelOne is a state-of-the-art multilingual model fine-tuned from the Microsoft Phi Vision architecture and weights. It is built for extracting structured information from a wide range of documents, images, and visual data, leveraging a specialized output_format
token for flexible, structured output.
- Base Model: Microsoft Phi Vision
- Training Data: 7M+ samples across 70+ languages.
- Output Flexibility: Supports free text, CSV, JSON, YAML, XML formats.
π‘ Join the Beta Program
Sign up for the Beta Program to finetune, evaluate and deploy this model on your own data and infrastructure.
π Capabilities
ModelOne is ideal for:
- Extracting structured data from scanned and photographed documents.
- Interpreting complex tables, charts, and visual data representations.
- Performing multilingual OCR across a broad set of languages.
- Adapting outputs based on user-defined formats for seamless integration.
π Training Data and Statistics
ModelOne was trained on a proprietary, high-quality dataset featuring a diverse range of documents and real-world images. The training process included over 7 million data points, with a strong focus on multilingual coverage.
ποΈ Dataset Composition
Document Type | Percentage | Details |
---|---|---|
Real-world Images | 29% | Photos, scans of receipts, forms, ID cards |
Multipage Documents | 49% | Contracts, reports, books (up to 123 pages) |
Single-page Documents | 14% | Invoices, certificates, single-page forms |
Visual Representations | 8% | Tables, charts, graphs, diagrams |
π Language Coverage
Balanced representation across six main languages, with additional support for 64 more:
Language | Percentage |
---|---|
English | 14.27% |
Spanish | 14.50% |
French | 14.34% |
German | 14.06% |
Italian | 14.06% |
Russian | 14.58% |
Other | 14.19% (64 additional languages) |
π Key Insights
- Balanced Language Representation: Each major language contributes approximately 14%, ensuring equitable performance.
- Document Diversity: Includes a mix of single and multi-page documents, real-world images, and visual representations for comprehensive model training.
- Robust Multilingual Capability: Coverage across 70+ languages makes it suitable for global applications needing extensive linguistic support.
- Downloads last month
- 0