---
language:
- de
- en
- fr
- es
- ru
- it
---
# ModelOne-Vision

## 🚀 Overview

**ModelOne** is a state-of-the-art multilingual model fine-tuned from the Microsoft Phi Vision architecture and weights. It is built for extracting structured information from a wide range of documents, images, and visual data, leveraging a specialized `output_format` token for flexible, structured output.

- **Base Model**: Microsoft Phi Vision
- **Training Data**: 7M+ samples across 70+ languages.
- **Output Flexibility**: Supports free text, CSV, JSON, YAML, XML formats.

## 💡 Join the Beta Program

[**Sign up for the Beta Program**](https://manufactai.com) to finetune, evaluate and deploy this model on your own data and infrastructure.


## 🌍 Capabilities

ModelOne is ideal for:

- Extracting structured data from scanned and photographed documents.
- Interpreting complex tables, charts, and visual data representations.
- Performing multilingual OCR across a broad set of languages.
- Adapting outputs based on user-defined formats for seamless integration.

## 📚 Training Data and Statistics

ModelOne was trained on a proprietary, high-quality dataset featuring a diverse range of documents and real-world images. The training process included over 7 million data points, with a strong focus on multilingual coverage.

### 🗂️ Dataset Composition

| **Document Type**           | **Percentage** | **Details**                              |
|-----------------------------|----------------|-------------------------------------------|
| **Real-world Images**       | 29%            | Photos, scans of receipts, forms, ID cards |
| **Multipage Documents**     | 49%            | Contracts, reports, books (up to 123 pages)|
| **Single-page Documents**   | 14%            | Invoices, certificates, single-page forms  |
| **Visual Representations**  | 8%             | Tables, charts, graphs, diagrams           |

### 🌍 Language Coverage

Balanced representation across six main languages, with additional support for 64 more:

| **Language** | **Percentage** |
|--------------|----------------|
| **English**  | 14.27%         |
| **Spanish**  | 14.50%         |
| **French**   | 14.34%         |
| **German**   | 14.06%         |
| **Italian**  | 14.06%         |
| **Russian**  | 14.58%         |
| **Other**    | 14.19% (64 additional languages) |

### 🔑 Key Insights

- **Balanced Language Representation**: Each major language contributes approximately 14%, ensuring equitable performance.
- **Document Diversity**: Includes a mix of single and multi-page documents, real-world images, and visual representations for comprehensive model training.
- **Robust Multilingual Capability**: Coverage across 70+ languages makes it suitable for global applications needing extensive linguistic support.