Dataset
#1
by
satpalsr
- opened
Can you the share the dataset used for it?
never mind. it's listed
satpalsr
changed discussion status to
closed
Ah sorry just saw this. yea I grabbed it from findnitai/english-to-hinglish
and split to a static train/test split that I pushed to nateraw/english-to-hinglish
. So all credit on dataset goes to
@findnitai
:)
For sake of transparency, when I did the actual run for this model, I removed a few rows (3-4 rows) from nateraw/english-to-hinglish
's train set that had obvious hinglish profanity (bhen****, etc.). Those examples are left in the version I pushed though and I don't have record of them since I did this manually in a text editor.