donut-base-ascii / README.md
nbroad's picture
first
6601038
|
raw
history blame
485 Bytes
---
license: apache-2.0
---
# donut-base-ascii
This is `"naver-clova-ix/donut-base"` but with all non-ascii tokens removed. This means the model is good for basic English use cases where the text is primarily a-zA-Z0-9 and basic punctuation.
The original model, `"naver-clova-ix/donut-base"`, did not have a token for `"1"`, so that has also been added. The notebook remove-donut-tokens.ipynb details the whole process.
This has not been trained any more than the original model.