jasonkrone/olmo_1b_toks_50b-mc-finetune-hpo-lr-with-mmlu Text Generation • Updated 26 days ago • 168
jasonkrone/olmo_1b_toks_126-mc-finetune-hpo-lr-with-mmlu Text Generation • Updated 28 days ago • 163
jasonkrone/olmo_1b_toks_75b-mc-finetune-hpo-lr-with-mmlu Text Generation • Updated 28 days ago • 161
jasonkrone/pythia-1-dot-4b-deduped-111b-toks-mc-finetune-hpo-lr-with-mmlu Text Generation • Updated 28 days ago • 164
jasonkrone/pythia-1-dot-4b-deduped-69b-toks-mc-finetune-hpo-lr-with-mmlu Text Generation • Updated 28 days ago • 160
jasonkrone/pythia-1-dot-4b-deduped-27b-toks-try2-mc-finetune-hpo-lr-with-mmlu Text Generation • Updated 28 days ago • 160
jasonkrone/pythia-1-dot-4b-deduped-111b-toks-mc-finetune-with-mmlu Text Generation • Updated Jan 3 • 191
jasonkrone/hpo_finetune_data_4way_mc_train_max_10k_per_task Viewer • Updated Jan 3 • 103k • 119
jasonkrone/mmlu_with_mmlu_pro_train_and_concat_dev_val_for_dev_hpo Viewer • Updated Nov 15, 2024 • 21.1k • 115
📀 Dataset comparison models Collection 1.8B models trained on 350BT to compare different pretraining datasets • 8 items • Updated Jun 12, 2024 • 37