license: apache-2.0 | |
language: | |
- hi | |
- en | |
This is the pytorch model parameters and associated data used for training a small transformer model from scratch. | |
The transformer model is used to train for translation from hindi_latin to english. | |
Among the files, training dataset used to create the model is also there. Data used for training is semi-synthetic. | |
Steps for creating datasets: | |
Obtain actualuser questions in hindi and human translations thereof in english. | |
Prompt GPT to create variations of key words taking phonetics in account and giving a user persona. | |