felipebandeira
/

donutlicenses3v3

vision-encoder-decoder

image-text-to-text

Inference Endpoints

Model card Files Files and versions Community

felipebandeira commited on Aug 23, 2023

Commit

f637104

·

1 Parent(s): 8a6267d

Update README.md

Files changed (1) hide show

README.md +13 -2

README.md CHANGED Viewed

@@ -9,5 +9,16 @@ metrics:
 pipeline_tag: image-to-text
 ---
-This model extracts information from EU driver's licenses and returns it as JSON. More details about its training and evaluation can be found in the link below. For optimal performance, we recommend that input images have a size of 1192x772.
-https://medium.com/@ofelipebandeira/transformers-vs-ocr-who-can-read-better-192e6b044dd3

 pipeline_tag: image-to-text
 ---
+This model extracts information from EU driver's licenses and returns it as JSON. For optimal performance, we recommend that input images:
+- have a size of 1192x772
+- have high resolution and do not contain light reflection effects
+Accuracy
+- on validation set: 98%
+- on set of real licenses: 63.93%
+Article describing model:
+https://medium.com/@ofelipebandeira/transformers-vs-ocr-who-can-read-better-192e6b044dd3
+Article describing synthetic dataset used in training:
+https://python.plainenglish.io/how-to-create-synthetic-datasets-of-document-images-5f140dee5e40