Spaces:
Sleeping
Sleeping
File size: 5,214 Bytes
3e35388 fec9bf0 c2f0029 30f438f c5ca5ae 06f01a0 ea048df d65fd5b 06f01a0 c525ead 06f01a0 c525ead 06f01a0 c525ead 06f01a0 c525ead 06f01a0 c525ead 06f01a0 c525ead 06f01a0 c525ead 06f01a0 c525ead 06f01a0 c525ead 06f01a0 c525ead d65fd5b 06f01a0 3dae3d4 06f01a0 d65fd5b 06f01a0 d65fd5b 06f01a0 d65fd5b 1419e47 93756a1 d65fd5b 06f01a0 4174e3f 06f01a0 4174e3f d65fd5b 06f01a0 0a279bd 06f01a0 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 |
<p style="font-size:70px; font-weight:bold; text-align:center;">
Image Data Extractor
</p>
<hr>
# Overview:
The **Image Data Extractor** is a Python-based tool designed to extract and structure text data from images of visiting cards using **PaddleOCR**. The extracted text is processed to identify and organize key information such as name, designation, contact number, address, and company name. The **Mistral 7B model** is used for advanced text analysis, and if it becomes unavailable, the system falls back to the **Gliner urchade/gliner_mediumv2.1** model.
Both **Mistral 7B** and **Gliner urchade/gliner_mediumv2.1** models are used under the **Apache 2.0 license**.
---
# Installation Guide:
1. **Create and Activate a Virtual Environment**
```bash
python -m venv venv
source venv/bin/activate # For Linux/Mac
# or
venv\Scripts\activate # For Windows
```
2. **Install Required Libraries**
```bash
pip install -r requirements.txt
```
3. **Run the Application**
- If Docker is being used:
```bash
docker-compose up --build
```
- Without Docker:
```bash
python app.py
```
4. **Set up Hugging Face Token**
- Add your Hugging Face token in the `.env` file:
```bash
HF_TOKEN=<your_huggingface_token>
```
---
# File Structure Overview:
```
ImageDataExtractor/
β
βββ app.py # Main Flask app
βββ requirements.txt # Dependencies
βββ Dockerfile # Docker container setup
βββ docker-compose.yml # Docker Compose setup
β
βββ utility/
β βββ utils.py # PaddleOCR integration, Image preprocessing and Mistral model processing
β
βββ template/
β βββ index.html # UI for image uploads
β βββ result.html # Display extracted results
β
βββ Backup/
β βββ modules/ # Base classes for data processing models
β β βββ base.py
β β βββ data_proc.py
β β βββ evaluator.py
β β βββ layers.py
β β βββ run_evaluation.py
β β βββ span_rep.py
β β βββ token_rep.py
β βββ backup.py # Backup handling Gliner Model integration and backup logic
β βββ model.py
β βββ save_load.py
β βββ train.py
β
βββ .env # Environment variables (includes Hugging Face token)
```
---
# Program Overview:
### PaddleOCR Integration (utility/utils.py):
- **Text Extraction**: The tool utilizes **PaddleOCR** to extract text from image-based inputs (PNG, JPG, JPEG) of visiting cards.
- **Preprocessing**: Handles basic image preprocessing to enhance text recognition for OCR.
### Mistral 7B Integration (utility/utils.py):
- **Data Structuring**: After text extraction, the **Mistral 7B model** processes the extracted data, structuring it into fields such as name, designation, contact number, address, and company name.
### Fallback Mechanism (Backup/backup.py):
- **Gliner urchade/gliner_mediumv2.1 Model**: If the Mistral model is unavailable, the system uses the **Gliner urchade/gliner_mediumv2.1 model** to perform the same task, ensuring continuous service.
- **Error Handling**: Manages failures in model availability and ensures smooth fallback.
### Web Interface (app.py):
- **Flask API**: Provides endpoints for image uploads and displays the results in a structured manner.
- **HTML Interface**: A frontend for users to upload images of visiting cards and view the parsed results.
---
# Tree Map of the Program:
```
app.py
βββ Handles Flask API and web interface
βββ Manages file upload
βββ Extracts text with PaddleOCR
βββ Processes text with Mistral 7B
βββ Displays structured results
utility/utils.py
βββ PaddleOCR for text extraction
βββ Mistral 7B for data structuring
Backup/backup.py
βββ Gliner urchade/gliner_mediumv2.1 as fallback
βββ Backup and error handling
```
---
# Licensing:
- **Mistral 7B model** is used under the [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0).
- **Gliner urchade/gliner_mediumv2.1 model** is used under the [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0).
---
# Main Task:
The primary objective is to extract and structure data from visiting cards. The system identifies and organizes:
- **Name**
- **Designation**
- **Phone Number**
- **Address**
- **Company Name**
---
# References:
- [PaddleOCR Documentation](https://github.com/PaddlePaddle/PaddleOCR)
- [Mistral 7B Documentation](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3/blob/main/README.md)
- [Gliner urchade/gliner_mediumv2.1 Documentation](https://huggingface.co/urchade/gliner_medium-v2.1/blob/main/README.md)
- [Flask Documentation](https://flask.palletsprojects.com/)
- [Docker Documentation](https://docs.docker.com/)
- [Virtual Environments in Python](https://docs.python.org/3/tutorial/venv.html)
--- |