Is the datasets for foundational model pre-training publicly accessible?
#10
by
JayceCeleste
- opened
Hello, thanks for the great paper, and publishing the model here!
I noticed you mentioned in the paper that "To investigate the impact of species diversity on genome foundational models, we’ve compiled and made publicly available two datasets for foundational model pre-training: the human genome and the multi-species genome. "
I tried to find it but failed, only to find the GUE dataset.
Could you please provide a link for it? Thanks : )