HamzaNaser
commited on
Commit
•
d93043f
1
Parent(s):
ec9ee87
Update README.md
Browse files
README.md
CHANGED
@@ -36,7 +36,7 @@ Arabic Tweets are selected randomly from the Arabic-Tweets Datasets https://hugg
|
|
36 |
|
37 |
|
38 |
## Dataset Limitations
|
39 |
-
The Dataset used to train the Model consist of some random Arabic Tweets that was not checked as described by the Dataset Authers hence its possible to find gramatically incorrect or semantically incomplted sentences, also the text where normalized in the Arabic-Tweets Datasets which might make it harder for the model to know the meaning of some sentences for some cases, even though the original Tweets crowled from the internet qulity was not perfect for our use case, the resulting trained model is relativley good, achiving Bleu score of
|
40 |
|
41 |
|
42 |
- Below image shows one of the issues the input Dataset might have, some of the missing punctuations might flip the meaning of the sentenece completely, for example of we got the word "No" in the begining of a sentece in this case the "No" would negate the upcoming speach. However if the word "No" where to be followed by a comma, it then will negate the prevuous speach and prove the upcoming sentence instead, both cases are completly contradicted just by adding a single comma.
|
|
|
36 |
|
37 |
|
38 |
## Dataset Limitations
|
39 |
+
The Dataset used to train the Model consist of some random Arabic Tweets that was not checked as described by the Dataset Authers hence its possible to find gramatically incorrect or semantically incomplted sentences, also the text where normalized in the Arabic-Tweets Datasets which might make it harder for the model to know the meaning of some sentences for some cases, even though the original Tweets crowled from the internet qulity was not perfect for our use case, the resulting trained model is relativley good, achiving Bleu score of 46.9 on the testing data, 46.9 score might consider a very high score, but our Model case is differenct than normal MT and we believe is considered easier to translate between dialects in same language than translating between completely different languages.
|
40 |
|
41 |
|
42 |
- Below image shows one of the issues the input Dataset might have, some of the missing punctuations might flip the meaning of the sentenece completely, for example of we got the word "No" in the begining of a sentece in this case the "No" would negate the upcoming speach. However if the word "No" where to be followed by a comma, it then will negate the prevuous speach and prove the upcoming sentence instead, both cases are completly contradicted just by adding a single comma.
|