Update fastchat_conversation_turns.py (#1294) [skip ci] 2b9687f unverified eltociear commited on Feb 27, 2024
fix steps check for anneal on first cycle (#1316) 2c9c88b unverified winglian commited on Feb 27, 2024
allow the optimizer prune ratio for ReLoRA to be configurable (#1287) 4b997c3 unverified winglian commited on Feb 12, 2024
simplify haldning for newer multipack patches so they can be added in a single place (#1270) 5698943 unverified winglian commited on Feb 7, 2024
relora: magnitude pruning of the optimizer (#1245) 8c2e05a unverified winglian commited on Feb 6, 2024
Add shifted sparse attention (#973) [skip-ci] 1d70f24 unverified jrc joecummings winglian commited on Jan 18, 2024
optimize calculation of cu_seqlens from position_ids (#1084) [skip ci] 90036eb unverified winglian commited on Jan 10, 2024
Added chatglm3 conversation type for training models like TinyLLama (#1036) 59b2d30 unverified xaviviro commited on Jan 4, 2024
bump transformers and update attention class map name (#1023) bcc78d8 unverified winglian commited on Jan 3, 2024
remove landmark attn and xpos rope implementations (#1010) 70b46ca unverified winglian commited on Dec 28, 2023
fix: switch to using the HuggingFace Transformers NEFT implementation (#941) ef24342 unverified kallewoof commited on Dec 13, 2023
refactor neft patch to be more re-usable similar to trl's impl (#796) 827ec3d unverified winglian commited on Oct 29, 2023
Mistral: Sliding Window Attention with Flash Attention and Sample Packing (#732) a045db0 unverified casperhansen winglian commited on Oct 16, 2023
flash_attention + sample packing for stablelm 3b (#671) 2d60ba3 unverified winglian commited on Oct 5, 2023
fix for flash attn w mistral w/o sammple packing (#648) b2edaae unverified winglian commited on Sep 28, 2023
skip some flash attn patches unless explicitly enabled (#643) 895f0a0 unverified winglian commited on Sep 27, 2023
btlm and falcon monkey patches for flash attn (#566) 6b9b229 unverified winglian commited on Sep 17, 2023
Add training callback to send predictions to WandB table (#521) 5b67ea9 unverified Glavin001 commited on Sep 13, 2023
ReLoRA implementation (with quantization) (#322) bde3c5a unverified chargoddard winglian commited on Aug 24, 2023
fix eval regression caused in 13f7efaf74fcd3c4514277ccb71914c589873f6a a213d99 tmm1 commited on Aug 21, 2023