|
|
|
<p style="font-size:70px; font-weight:bold; text-align:center;"> |
|
Image Data Extractor |
|
</p> |
|
<hr> |
|
|
|
# Overview: |
|
The **Image Data Extractor** is a Python-based tool designed to extract and structure text data from images of visiting cards using **PaddleOCR**. The extracted text is processed to identify and organize key information such as name, designation, contact number, address, and company name. The **Mistral 7B model** is used for advanced text analysis, and if it becomes unavailable, the system falls back to the **Gliner urchade/gliner_mediumv2.1** model. |
|
Both **Mistral 7B** and **Gliner urchade/gliner_mediumv2.1** models are used under the **Apache 2.0 license**. |
|
|
|
--- |
|
# Installation Guide: |
|
|
|
1. **Create and Activate a Virtual Environment** |
|
```bash |
|
python -m venv venv |
|
source venv/bin/activate # For Linux/Mac |
|
# or |
|
venv\Scripts\activate # For Windows |
|
``` |
|
|
|
2. **Install Required Libraries** |
|
```bash |
|
pip install -r requirements.txt |
|
``` |
|
|
|
3. **Run the Application** |
|
- If Docker is being used: |
|
```bash |
|
docker-compose up --build |
|
``` |
|
- Without Docker: |
|
```bash |
|
python app.py |
|
``` |
|
|
|
4. **Set up Hugging Face Token** |
|
- Add your Hugging Face token in the `.env` file: |
|
```bash |
|
HF_TOKEN=<your_huggingface_token> |
|
``` |
|
--- |
|
# File Structure Overview: |
|
|
|
``` |
|
ImageDataExtractor/ |
|
β |
|
βββ app.py # Main Flask app |
|
βββ requirements.txt # Dependencies |
|
βββ Dockerfile # Docker container setup |
|
βββ docker-compose.yml # Docker Compose setup |
|
β |
|
βββ utility/ |
|
β βββ utils.py # PaddleOCR integration, Image preprocessing and Mistral model processing |
|
β |
|
βββ template/ |
|
β βββ index.html # UI for image uploads |
|
β βββ result.html # Display extracted results |
|
β |
|
βββ Backup/ |
|
β βββ modules/ # Base classes for data processing models |
|
β β βββ base.py |
|
β β βββ data_proc.py |
|
β β βββ evaluator.py |
|
β β βββ layers.py |
|
β β βββ run_evaluation.py |
|
β β βββ span_rep.py |
|
β β βββ token_rep.py |
|
β βββ backup.py # Backup handling Gliner Model integration and backup logic |
|
β βββ model.py |
|
β βββ save_load.py |
|
β βββ train.py |
|
β |
|
βββ .env # Environment variables (includes Hugging Face token) |
|
``` |
|
--- |
|
# Program Overview: |
|
|
|
### PaddleOCR Integration (utility/utils.py): |
|
- **Text Extraction**: The tool utilizes **PaddleOCR** to extract text from image-based inputs (PNG, JPG, JPEG) of visiting cards. |
|
- **Preprocessing**: Handles basic image preprocessing to enhance text recognition for OCR. |
|
|
|
### Mistral 7B Integration (utility/utils.py): |
|
- **Data Structuring**: After text extraction, the **Mistral 7B model** processes the extracted data, structuring it into fields such as name, designation, contact number, address, and company name. |
|
|
|
### Fallback Mechanism (Backup/backup.py): |
|
- **Gliner urchade/gliner_mediumv2.1 Model**: If the Mistral model is unavailable, the system uses the **Gliner urchade/gliner_mediumv2.1 model** to perform the same task, ensuring continuous service. |
|
- **Error Handling**: Manages failures in model availability and ensures smooth fallback. |
|
|
|
### Web Interface (app.py): |
|
- **Flask API**: Provides endpoints for image uploads and displays the results in a structured manner. |
|
- **HTML Interface**: A frontend for users to upload images of visiting cards and view the parsed results. |
|
--- |
|
# Tree Map of the Program: |
|
|
|
``` |
|
app.py |
|
βββ Handles Flask API and web interface |
|
βββ Manages file upload |
|
βββ Extracts text with PaddleOCR |
|
βββ Processes text with Mistral 7B |
|
βββ Displays structured results |
|
|
|
utility/utils.py |
|
βββ PaddleOCR for text extraction |
|
βββ Mistral 7B for data structuring |
|
|
|
Backup/backup.py |
|
βββ Gliner urchade/gliner_mediumv2.1 as fallback |
|
βββ Backup and error handling |
|
|
|
``` |
|
--- |
|
# Licensing: |
|
- **Mistral 7B model** is used under the [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0). |
|
- **Gliner urchade/gliner_mediumv2.1 model** is used under the [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0). |
|
|
|
--- |
|
# Main Task: |
|
The primary objective is to extract and structure data from visiting cards. The system identifies and organizes: |
|
- **Name** |
|
- **Designation** |
|
- **Phone Number** |
|
- **Address** |
|
- **Company Name** |
|
|
|
--- |
|
# References: |
|
|
|
- [PaddleOCR Documentation](https://github.com/PaddlePaddle/PaddleOCR) |
|
- [Mistral 7B Documentation](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3/blob/main/README.md) |
|
- [Gliner urchade/gliner_mediumv2.1 Documentation](https://huggingface.co/urchade/gliner_medium-v2.1/blob/main/README.md) |
|
- [Flask Documentation](https://flask.palletsprojects.com/) |
|
- [Docker Documentation](https://docs.docker.com/) |
|
- [Virtual Environments in Python](https://docs.python.org/3/tutorial/venv.html) |
|
--- |