Parallel synthetic data available?
#2
by
xezpeleta
- opened
Hi!
In the description I saw the following:
This model was trained from scratch using Marian NMT on a combination of English-Basque datasets totalling 20,523,431 sentence pairs. 9,033,998 sentence pairs were parallel data collected from the web while the remaining 11,489,433 sentence pairs were parallel synthetic data created using the Google Translate translator
Is the parallel synthetic data (created using Google Translate) available on HF datasets?
Thanks!!