|
--- |
|
license: apache-2.0 |
|
--- |
|
|
|
## Setup Instructions |
|
|
|
### Clone the Surya OCR GitHub Repository |
|
|
|
```bash |
|
git clone https://github.com/VikParuchuri/surya.git |
|
cd surya |
|
``` |
|
|
|
### Switch to v0.4.14 |
|
|
|
```bash |
|
git checkout f7c6c04 |
|
``` |
|
|
|
### Install Dependencies |
|
|
|
The author has not provided requirements.txt file, but `environment.yml` from our conda environment has been uploaded, This file can be used to recreate environment for arabic_layout_model model. |
|
|
|
|
|
### ArabicDoc Pipeline |
|
|
|
Download `ArabicDoc.cpython-310-x86_64-linux-gnu.so` , `10x_best.pt` and `surya folder` from the Repository. |
|
Place `ArabicDoc.cpython-310-x86_64-linux-gnu.so`, `10x_best.pt` and `surya folder` in same directory (They are dependent on each other). |
|
|
|
```python |
|
from ArabicDoc import arabic_layout_model # This import will originate from ArabicDoc.cpython-310-x86_64-linux-gnu.so , which is present in the repo. Also this works with Linux based OS only. |
|
from surya.postprocessing.heatmap import draw_bboxes_on_image |
|
from PIL import Image |
|
|
|
image_path = "sample.jpg" |
|
image = Image.open(image_path) |
|
bboxes = arabic_layout_model(image_path) |
|
plotted_image = draw_bboxes_on_image(bboxes,image) |
|
``` |
|
#### Refer to `benchmark.ipynb` for comparison between Traditional Surya Layout Model and New Layout Model. |
|
#### Refer to `results` folder to visualize images obtained from both the models. |