Diminishing returns over size
Hey guys,
I love your work. I have been using your model since yesterday and am very impressed. Thanks for making it open source! I see that even the smaller models perform quite well and then the bigger models do not perform that crazily, meaning that the models do perform well but you see very small improvements, which of course are still remarkable.
I noticed that whenever there is a lot of information/text in an image, it struggles a bit, which I think is because of the small resolution size of the images even in bigger models, which I assume is the same everywhere. If I can advise you, please train a model that can take upto a mega pixel, like in a curriculum learning kinda scenario where the model works well now but then maybe extend it at the end of training for 4 times the pixels (double the size), this way I think you can kill it in general OCR, I am pretty sure.
But regardless, this is amazing and thanks for contributing to the society!
Best regards,
HD
Hello!
Thank you for your interest. I would like to ask how you are running our model, if you're using the code in quick-start, there is a max_num
parameter that can be used to adjust image resolution. The default value is 6, which means that the maximum resolution of the input image is 6x448x448
, for example, 896x1344
.
If you're using these models on our online demo, you can adjust max_input_tiles
in the Advanced Options
sidebar on the left side. You can set it to 24, which means that the input resolution has at most 24x448x448
pixels, or about 4.8 million pixels.
I hope this may help you.
Best regards,
Zhe Chen
I was using the default max_num=6 values and now as soon as I pumped it to 12, the results became much better! Thanks again main! Swift response!