ImageDataExtractor2

Sleeping

App Files Files Community

ImageDataExtractor2 / README2.md

WebashalarForML

Update README2.md

c5ca5ae verified 2 months ago

preview code

raw

history blame

5.21 kB


	<p style="font-size:70px; font-weight:bold; text-align:center;">
	Image Data Extractor
	</p>
	<hr>

	# Overview:
	The Image Data Extractor is a Python-based tool designed to extract and structure text data from images of visiting cards using PaddleOCR. The extracted text is processed to identify and organize key information such as name, designation, contact number, address, and company name. The Mistral 7B model is used for advanced text analysis, and if it becomes unavailable, the system falls back to the Gliner urchade/gliner_mediumv2.1 model.
	Both Mistral 7B and Gliner urchade/gliner_mediumv2.1 models are used under the Apache 2.0 license.

	---
	# Installation Guide:

	1. Create and Activate a Virtual Environment
	```bash
	python -m venv venv
	source venv/bin/activate # For Linux/Mac
	# or
	venv\Scripts\activate # For Windows
	```

	2. Install Required Libraries
	```bash
	pip install -r requirements.txt
	```

	3. Run the Application
	- If Docker is being used:
	```bash
	docker-compose up --build
	```
	- Without Docker:
	```bash
	python app.py
	```

	4. Set up Hugging Face Token
	- Add your Hugging Face token in the `.env` file:
	```bash
	HF_TOKEN=<your_huggingface_token>
	```
	---
	# File Structure Overview:

	```
	ImageDataExtractor/
	│
	├── app.py # Main Flask app
	├── requirements.txt # Dependencies
	├── Dockerfile # Docker container setup
	├── docker-compose.yml # Docker Compose setup
	│
	├── utility/
	│ └── utils.py # PaddleOCR integration, Image preprocessing and Mistral model processing
	│
	├── template/
	│ ├── index.html # UI for image uploads
	│ └── result.html # Display extracted results
	│
	├── Backup/
	│ ├── modules/ # Base classes for data processing models
	│ │ └── base.py
	│ │ └── data_proc.py
	│ │ └── evaluator.py
	│ │ └── layers.py
	│ │ └── run_evaluation.py
	│ │ └── span_rep.py
	│ │ └── token_rep.py
	│ ├── backup.py # Backup handling Gliner Model integration and backup logic
	│ └── model.py
	│ └── save_load.py
	│ └── train.py
	│
	└── .env # Environment variables (includes Hugging Face token)
	```
	---
	# Program Overview:

	### PaddleOCR Integration (utility/utils.py):
	- Text Extraction: The tool utilizes PaddleOCR to extract text from image-based inputs (PNG, JPG, JPEG) of visiting cards.
	- Preprocessing: Handles basic image preprocessing to enhance text recognition for OCR.

	### Mistral 7B Integration (utility/utils.py):
	- Data Structuring: After text extraction, the Mistral 7B model processes the extracted data, structuring it into fields such as name, designation, contact number, address, and company name.

	### Fallback Mechanism (Backup/backup.py):
	- Gliner urchade/gliner_mediumv2.1 Model: If the Mistral model is unavailable, the system uses the Gliner urchade/gliner_mediumv2.1 model to perform the same task, ensuring continuous service.
	- Error Handling: Manages failures in model availability and ensures smooth fallback.

	### Web Interface (app.py):
	- Flask API: Provides endpoints for image uploads and displays the results in a structured manner.
	- HTML Interface: A frontend for users to upload images of visiting cards and view the parsed results.
	---
	# Tree Map of the Program:

	```
	app.py
	├── Handles Flask API and web interface
	├── Manages file upload
	├── Extracts text with PaddleOCR
	├── Processes text with Mistral 7B
	└── Displays structured results

	utility/utils.py
	├── PaddleOCR for text extraction
	└── Mistral 7B for data structuring

	Backup/backup.py
	├── Gliner urchade/gliner_mediumv2.1 as fallback
	└── Backup and error handling

	```
	---
	# Licensing:
	- Mistral 7B model is used under the [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0).
	- Gliner urchade/gliner_mediumv2.1 model is used under the [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0).

	---
	# Main Task:
	The primary objective is to extract and structure data from visiting cards. The system identifies and organizes:
	- Name
	- Designation
	- Phone Number
	- Address
	- Company Name

	---
	# References:

	- [PaddleOCR Documentation](https://github.com/PaddlePaddle/PaddleOCR)
	- [Mistral 7B Documentation](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3/blob/main/README.md)
	- [Gliner urchade/gliner_mediumv2.1 Documentation](https://huggingface.co/urchade/gliner_medium-v2.1/blob/main/README.md)
	- [Flask Documentation](https://flask.palletsprojects.com/)
	- [Docker Documentation](https://docs.docker.com/)
	- [Virtual Environments in Python](https://docs.python.org/3/tutorial/venv.html)
	---