|
--- |
|
license: cc-by-nc-sa-4.0 |
|
language: |
|
- en |
|
library_name: transformers |
|
tags: |
|
- finance |
|
metrics: |
|
- accuracy |
|
--- |
|
|
|
## Model |
|
|
|
This model is a fine-tuned version of [microsoft/layoutlmv3-base](https://huggingface.co/microsoft/layoutlmv3-base) trained on [Financial Documents Clustering Kaggle Dataset](https://www.kaggle.com/datasets/drcrabkg/financial-statements-clustering). |
|
|
|
It classifies document images into one of the following (5) classes: |
|
|
|
- Income Statements |
|
- Balance Sheets |
|
- Cash Flows |
|
- Notes |
|
- Others |
|
|
|
## Training |
|
|
|
This model uses OCR data from [EasyOCR](https://github.com/JaidedAI/EasyOCR) instead of the default Tesseract OCR engine. |
|
|
|
## Libraries |
|
|
|
- transformers 4.25.1 |
|
- pytorch-lightning 1.8.6 |
|
- torchmetrics 0.11.0 |
|
- easyocr 1.6.2 |