HamzaNaser commited on
Commit
23ea293
1 Parent(s): 0288c93

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -4
README.md CHANGED
@@ -52,8 +52,7 @@ The Dataset used to train the Model consist of some random Arabic Tweets that wa
52
 
53
  - Below image shows one of the issues the input Dataset might have, some of the missing punctuations might flip the meaning of the sentenece completely, for example of we got the word "No" in the begining of a sentece in this case the "No" would negate the upcoming speach. However if the word "No" where to be followed by a comma, it then will negate the prevuous speach and prove the upcoming sentence instead, both cases are completly contradicted just by adding a single comma.
54
 
55
- ![Puctuation Issue Example](https://huggingface.co/HamzaNaser/Dialects-to-MSA-Transformer/resolve/main/Punctuations.png)
56
- <img src="https://huggingface.co/HamzaNaser/Dialects-to-MSA-Transformer/resolve/main/Punctuations.png" alt="Alt text" width="500"/>
57
 
58
  - Another example for Data limitation is the sentece might be inaccurate or incomplete, making it harder for the GPT model to convert it to MSA or Classical Arabic, many examples for inconsistent input text can be found in the Dataset the Model trained on.
59
 
@@ -66,8 +65,7 @@ Classical Arabic sentences used to train the model where generated using GPT Mod
66
  As an estimate on 200K random samples, most training samples are from Gulf region, the dialects regions is an estimate of the input Arabic-Tweets Dataset classified by DialectIdentifyer provided by camel tools https://camel-tools.readthedocs.io/en/latest/api/dialectid.html, we also can say that the Model can work better to convert below Regions' Dialects into MSA, below figure shows the approximate dilects by regions the Model were trained on.
67
 
68
 
69
- ![Dialects by Region](https://huggingface.co/HamzaNaser/Dialects-to-MSA-Transformer/resolve/main/Dialects%20by%20Region.png)
70
-
71
 
72
 
73
  # Other use cases for the model
 
52
 
53
  - Below image shows one of the issues the input Dataset might have, some of the missing punctuations might flip the meaning of the sentenece completely, for example of we got the word "No" in the begining of a sentece in this case the "No" would negate the upcoming speach. However if the word "No" where to be followed by a comma, it then will negate the prevuous speach and prove the upcoming sentence instead, both cases are completly contradicted just by adding a single comma.
54
 
55
+ <img src="https://huggingface.co/HamzaNaser/Dialects-to-MSA-Transformer/resolve/main/Punctuations.png" alt="Puctuation Issue Example" width="500"/>
 
56
 
57
  - Another example for Data limitation is the sentece might be inaccurate or incomplete, making it harder for the GPT model to convert it to MSA or Classical Arabic, many examples for inconsistent input text can be found in the Dataset the Model trained on.
58
 
 
65
  As an estimate on 200K random samples, most training samples are from Gulf region, the dialects regions is an estimate of the input Arabic-Tweets Dataset classified by DialectIdentifyer provided by camel tools https://camel-tools.readthedocs.io/en/latest/api/dialectid.html, we also can say that the Model can work better to convert below Regions' Dialects into MSA, below figure shows the approximate dilects by regions the Model were trained on.
66
 
67
 
68
+ <img src="https://huggingface.co/HamzaNaser/Dialects-to-MSA-Transformer/resolve/main/Dialects%20by%20Region.png" alt="Dialects by Region" width="600">
 
69
 
70
 
71
  # Other use cases for the model