library_name: transformers datasets: - HuggingFaceTB/cosmo2_training_data_subset_1M
Tokenizer for the training of cosmo2. This tokenizer was trained on 1M samples from: