Update wandb_log_model on pythia_1_2B_alpaca.yml abddcf4 unverified Viktorius Suwandi commited on May 29, 2023
deepspeed doesn't work with flash-attn, and the gpu savings w flash attn are better than the deepspeed headaches d1aed4c winglian commited on Apr 16, 2023