Add shifted sparse attention (#973) [skip-ci] 1d70f24 unverified jrc joecummings winglian commited on Jan 18
bump transformers and update attention class map name (#1023) bcc78d8 unverified winglian commited on Jan 3
add e2e tests for checking functionality of resume from checkpoint (#865) b3a61e8 unverified winglian commited on Nov 16, 2023
simplify by removing duplicate base_model_config (#772) 2d8def6 unverified winglian commited on Oct 23, 2023
Feat: Allow usage of native Mistral FA when no sample_packing (#669) 697c50d unverified Nanobit commited on Oct 4, 2023