cjvt
/

SloBERTa-slo-word-spelling-annotator

word spelling error annotator

Inference Endpoints

Model card Files Files and versions Community

Martin97Bozic commited on Nov 23, 2023

Commit

39d8722

·

1 Parent(s): baa6e7e

Create README.md

Files changed (1) hide show

README.md +62 -0

README.md ADDED Viewed

	@@ -0,0 +1,62 @@

+---
+license: cc-by-sa-4.0
+datasets:
+- cjvt/cc_gigafida
+- cjvt/solar3
+- cjvt/sloleks
+language:
+- sl
+tags:
+- word spelling error annotator
+---
+---
+language:
+- sl
+license: cc-by-sa-4.0
+---
+# SloBERTa-Incorrect-Spelling-Annotator
+This SloBERTa model is designed to annotate incorrectly spelled words in text. It utilizes the following labels:
+- 1: Indicates incorrectly spelled words.
+- 2: Denotes cases where two words should be written together.
+- 3: Suggests that a word should be written separately.
+## Model Output Example
+Imagine we have the following Slovenian text:
+_Model vbesedilu o znači besede, v katerih se najajajo napake._
+If we convert model to SloBERTa model
+_Model <mask> vbesedilu <mask> o <mask> znači <mask> besede <mask> , <mask> v <mask> katerih <mask> se <mask> najajajo <mask> napake <mask> . <mask>_
+The model might return the following predictions (note: predictions chosen for demonstration/explanation, not reproducibility!):
+_Model 0 vbesedilu 3 o 2 znači 2 besede 0 , 0 v 0 katerih 0 se 0 najajajo 1 napake 0 . 0_
+We can observe the following:
+1. In the input sentence, the word `najajajo` is spelled incorrectly, so the model marks it with the token (0).
+2. The word `vbesedilu` should be written as two words `v` and `besedilu`, so the model marks it with the token (3).
+3. The words `o` and `znači` should be written as one word `označi`, so the model marks them with the tokens (2).
+## More details
+Testing model **generated** test sets provides following result:
+- `1` token prediction -> Precission: 0,979; Recall: 0,986; F1: 0,983
+Testing the model with test sets constructed using the **Šolar** dataset provides the following results (combining detection and correction of words with incorrect spelling):
+- `1` token prediction -> Precission: 0,753; Recall: 0,873; F1: 0,809
+- `2` token prediction -> Precission: 0,516; Recall:0,671; F1: 0,585
+- `3` token prediction -> Precission: 0,826; Recall: 0,853; F1: 0,839
+## Acknowledgement
+The authors acknowledge the financial support from the Slovenian Research and Innovation Agency - research core funding No. P6-0411: Language Resources and Technologies for Slovene and research project No. J7-3159: Empirical foundations for digitally-supported development of writing skills.