Parallel synthetic data available?

#2
by xezpeleta - opened

Hi!

In the description I saw the following:

This model was trained from scratch using Marian NMT on a combination of English-Basque datasets totalling 20,523,431 sentence pairs. 9,033,998 sentence pairs were parallel data collected from the web while the remaining 11,489,433 sentence pairs were parallel synthetic data created using the Google Translate translator

Is the parallel synthetic data (created using Google Translate) available on HF datasets?

Thanks!!

Sign up or log in to comment