HamzaNaser
/

Dialects-to-MSA-Transformer

Text2Text Generation

Dialects Conversion

Text Correction

En-Ar Transtaltion

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

HamzaNaser commited on Aug 31

Commit

d93043f

•

1 Parent(s): ec9ee87

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -36,7 +36,7 @@ Arabic Tweets are selected randomly from the Arabic-Tweets Datasets https://hugg
 ## Dataset Limitations
-The Dataset used to train the Model consist of some random Arabic Tweets that was not checked as described by the Dataset Authers hence its possible to find gramatically incorrect or semantically incomplted sentences, also the text where normalized in the Arabic-Tweets Datasets which might make it harder for the model to know the meaning of some sentences for some cases, even though the original Tweets crowled from the internet qulity was not perfect for our use case, the resulting trained model is relativley good, achiving Bleu score of ??? on the testing data.
 - Below image shows one of the issues the input Dataset might have, some of the missing punctuations might flip the meaning of the sentenece completely, for example of we got the word "No" in the begining of a sentece in this case the "No" would negate the upcoming speach. However if the word "No" where to be followed by a comma, it then will negate the prevuous speach and prove the upcoming sentence instead, both cases are completly contradicted just by adding a single comma.

 ## Dataset Limitations
+The Dataset used to train the Model consist of some random Arabic Tweets that was not checked as described by the Dataset Authers hence its possible to find gramatically incorrect or semantically incomplted sentences, also the text where normalized in the Arabic-Tweets Datasets which might make it harder for the model to know the meaning of some sentences for some cases, even though the original Tweets crowled from the internet qulity was not perfect for our use case, the resulting trained model is relativley good, achiving Bleu score of 46.9 on the testing data, 46.9 score might consider a very high score, but our Model case is differenct than normal MT and we believe is considered easier to translate between dialects in same language than translating between completely different languages.
 - Below image shows one of the issues the input Dataset might have, some of the missing punctuations might flip the meaning of the sentenece completely, for example of we got the word "No" in the begining of a sentece in this case the "No" would negate the upcoming speach. However if the word "No" where to be followed by a comma, it then will negate the prevuous speach and prove the upcoming sentence instead, both cases are completly contradicted just by adding a single comma.