File size: 576 Bytes
ab0f32a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
---
license: apache-2.0
language:
- hi
- en
---

This is the pytorch model parameters and associated data used for training a small transformer model from scratch.
The transformer model is used to train for translation from hindi_latin to english.

Among the files, training dataset used to create the model is also there. Data used for training is semi-synthetic. 

Steps for creating datasets:
Obtain actualuser questions in hindi and human translations thereof in english.
Prompt GPT to create variations of key words taking phonetics in account and giving a user persona.