Commit History
beta support for multipack with gemmoe: (#1402)
8df7b88
unverified
Update fastchat_conversation_turns.py (#1294) [skip ci]
2b9687f
unverified
fix steps check for anneal on first cycle (#1316)
2c9c88b
unverified
make mlflow optional (#1317)
5894f0e
unverified
multipack for gemma (#1313)
2752d5f
unverified
allow the optimizer prune ratio for ReLoRA to be configurable (#1287)
4b997c3
unverified
Add MPS support (#1264)
fac2d98
unverified
simplify haldning for newer multipack patches so they can be added in a single place (#1270)
5698943
unverified
relora: magnitude pruning of the optimizer (#1245)
8c2e05a
unverified
support for true batches with multipack (#1230)
00568c1
unverified
Respect sliding_window=None (#1214)
62ca4a2
unverified
DreamGenX
commited on
Mixtral fixes 20240124 (#1192) [skip ci]
54d2ac1
unverified
Phi2 multipack (#1173)
814aee6
unverified
Falcon embeddings (#1149) [skip docker]
e799e08
unverified
Qwen2 (#1166)
f5a828a
unverified
Multipack simplify for Mixtral (#1142)
6910e6a
unverified
optimize calculation of cu_seqlens from position_ids (#1084) [skip ci]
90036eb
unverified
Added chatglm3 conversation type for training models like TinyLLama (#1036)
59b2d30
unverified
bump transformers and update attention class map name (#1023)
bcc78d8
unverified
remove landmark attn and xpos rope implementations (#1010)
70b46ca
unverified
fix mistral prompt assembly (#982)
7bbaac9
unverified
Fix prompt assembly for llama (#952)
5ada140
unverified
fix: switch to using the HuggingFace Transformers NEFT implementation (#941)
ef24342
unverified
kallewoof
commited on
Mixtral official (#942)
7fabc4d
unverified
adds llama and mistral dropout support (#858)
db8a8af
unverified
various bugfixes (#856)
1470650
unverified
refactor neft patch to be more re-usable similar to trl's impl (#796)
827ec3d
unverified
Hotfix for not saving correctly (#762)
32eeeb5
unverified
Implement fused modules (#747)
15d3a65
unverified
Mistral: Sliding Window Attention with Flash Attention and Sample Packing (#732)
a045db0
unverified
add noisy embedding (#721)
3bd9528
unverified
Maxime
Maxime
commited on