OCR-LayoutLMv3 / README.md

Update README.md

5fa47bf about 2 years ago

4.29 kB

	---
	license: cc-by-nc-sa-4.0
	tags:
	- generated_from_trainer
	datasets:
	- funsd-layoutlmv3
	metrics:
	- precision
	- recall
	- f1
	- accuracy
	model-index:
	- name: OCR-LayoutLMv3
	results:
	- task:
	name: Token Classification
	type: token-classification
	dataset:
	name: funsd-layoutlmv3
	type: funsd-layoutlmv3
	config: funsd
	split: train
	args: funsd
	metrics:
	- name: Precision
	type: precision
	value: 0.8988653182042428
	- name: Recall
	type: recall
	value: 0.905116741182315
	- name: F1
	type: f1
	value: 0.9019801980198019
	- name: Accuracy
	type: accuracy
	value: 0.8403661000832046
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# OCR-LayoutLMv3

	This model is a fine-tuned version of [microsoft/layoutlmv3-base](https://huggingface.co/microsoft/layoutlmv3-base) on the funsd-layoutlmv3 dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.9788
	- Precision: 0.8989
	- Recall: 0.9051
	- F1: 0.9020
	- Accuracy: 0.8404

	## Model description

	LayoutLMv3 is a pre-trained multimodal Transformer for Document AI with unified text and image masking. The simple unified architecture and training objectives make LayoutLMv3 a general-purpose pre-trained model. For example, LayoutLMv3 can be fine-tuned for both text-centric tasks, including form understanding, receipt understanding, and document visual question answering, and image-centric tasks such as document image classification and document layout analysis.

	[LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking](https://arxiv.org/abs/2204.08387)
	Yupan Huang, Tengchao Lv, Lei Cui, Yutong Lu, Furu Wei, Preprint 2022.




	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 1e-05
	- train_batch_size: 2
	- eval_batch_size: 2
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- training_steps: 2000

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Precision \| Recall \| F1 \| Accuracy \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:---------:\|:------:\|:------:\|:--------:\|
	\| No log \| 1.33 \| 100 \| 0.6966 \| 0.7418 \| 0.8063 \| 0.7727 \| 0.7801 \|
	\| No log \| 2.67 \| 200 \| 0.5767 \| 0.8104 \| 0.8644 \| 0.8365 \| 0.8117 \|
	\| No log \| 4.0 \| 300 \| 0.5355 \| 0.8246 \| 0.8852 \| 0.8539 \| 0.8295 \|
	\| No log \| 5.33 \| 400 \| 0.5240 \| 0.8706 \| 0.8922 \| 0.8813 \| 0.8427 \|
	\| 0.5326 \| 6.67 \| 500 \| 0.6337 \| 0.8528 \| 0.8778 \| 0.8651 \| 0.8260 \|
	\| 0.5326 \| 8.0 \| 600 \| 0.6870 \| 0.8698 \| 0.8828 \| 0.8762 \| 0.8240 \|
	\| 0.5326 \| 9.33 \| 700 \| 0.6584 \| 0.8723 \| 0.9061 \| 0.8889 \| 0.8342 \|
	\| 0.5326 \| 10.67 \| 800 \| 0.7186 \| 0.8868 \| 0.9031 \| 0.8949 \| 0.8335 \|
	\| 0.5326 \| 12.0 \| 900 \| 0.6822 \| 0.9040 \| 0.9076 \| 0.9058 \| 0.8526 \|
	\| 0.1248 \| 13.33 \| 1000 \| 0.7042 \| 0.8872 \| 0.9021 \| 0.8946 \| 0.8511 \|
	\| 0.1248 \| 14.67 \| 1100 \| 0.7920 \| 0.9027 \| 0.9036 \| 0.9032 \| 0.8480 \|
	\| 0.1248 \| 16.0 \| 1200 \| 0.8052 \| 0.8964 \| 0.9151 \| 0.9056 \| 0.8389 \|
	\| 0.1248 \| 17.33 \| 1300 \| 0.8932 \| 0.8995 \| 0.9066 \| 0.9030 \| 0.8329 \|
	\| 0.1248 \| 18.67 \| 1400 \| 0.8728 \| 0.8950 \| 0.9061 \| 0.9005 \| 0.8398 \|
	\| 0.0442 \| 20.0 \| 1500 \| 0.9051 \| 0.8960 \| 0.9116 \| 0.9037 \| 0.8347 \|
	\| 0.0442 \| 21.33 \| 1600 \| 0.9587 \| 0.8947 \| 0.9031 \| 0.8989 \| 0.8401 \|
	\| 0.0442 \| 22.67 \| 1700 \| 0.9822 \| 0.9042 \| 0.9046 \| 0.9044 \| 0.8389 \|
	\| 0.0442 \| 24.0 \| 1800 \| 0.9734 \| 0.9043 \| 0.9061 \| 0.9052 \| 0.8391 \|
	\| 0.0442 \| 25.33 \| 1900 \| 0.9842 \| 0.9042 \| 0.9091 \| 0.9066 \| 0.8410 \|
	\| 0.0225 \| 26.67 \| 2000 \| 0.9788 \| 0.8989 \| 0.9051 \| 0.9020 \| 0.8404 \|


	### Framework versions

	- Transformers 4.25.0.dev0
	- Pytorch 1.12.1
	- Datasets 2.6.1
	- Tokenizers 0.13.1