nateraw/llama-2-7b-english-to-hinglish

satpalsr

Dec 14, 2023

Can you the share the dataset used for it?

satpalsr

Dec 14, 2023

never mind. it's listed

satpalsr changed discussion status to closed Dec 14, 2023

nateraw

Owner Jan 5, 2024

Ah sorry just saw this. yea I grabbed it from findnitai/english-to-hinglish and split to a static train/test split that I pushed to nateraw/english-to-hinglish. So all credit on dataset goes to @findnitai :)

For sake of transparency, when I did the actual run for this model, I removed a few rows (3-4 rows) from nateraw/english-to-hinglish's train set that had obvious hinglish profanity (bhen****, etc.). Those examples are left in the version I pushed though and I don't have record of them since I did this manually in a text editor.

nateraw
/

llama-2-7b-english-to-hinglish

Dataset