casts the prepared data to int16 (doesn't help with training memory) 2db9436 winglian commited on Apr 18, 2023
fix lora target module, require explicit flash attention, fix min logging steps, don't use adam8bit for int4, hash prepared datasets, support hf hub datasets 87e073d winglian commited on Apr 17, 2023
deepspeed doesn't work with flash-attn, and the gpu savings w flash attn are better than the deepspeed headaches d1aed4c winglian commited on Apr 16, 2023
add llama 7b config and fiz lora_fan_in_fan_out for llama (copy pasta bug) d060c80 winglian commited on Apr 15, 2023
refactor trainer setup to account for deepspeed integration 2df63ef winglian commited on Apr 15, 2023
config chooser, update readme instructions, device config, llama flash attention, debug out the labels, fix config key checks, other bugfixes f2a2029 winglian commited on Apr 14, 2023