winglian's picture
add streaming dataset support for pretraining datasets
eea2731