Commits · Dovakiins/qwerrwe

Add support for Gemma chat template (#1530)

60f5ce0
unverified

Haoxiang-Wang

winglian commited on Apr 21

Unsloth gradient checkpointing offload (#1528)

6319da1
unverified

winglian commited on Apr 16

qwen2_moe support w multipack (#1455)

6086be8
unverified

winglian commited on Mar 29

fix some of the edge cases for Jamba (#1452)

05b398a
unverified

winglian commited on Mar 29

Remove seq_len arg in rotary_emb (#1443)

e07347b
unverified

wenbopan

winglian commited on Mar 26

beta support for multipack with gemmoe: (#1402)

8df7b88
unverified

winglian commited on Mar 14

add starcoder2 (#1349)

e0f1895
unverified

ehartford

winglian

Nanobit commited on Mar 6

Update fastchat_conversation_turns.py (#1294) [skip ci]

2b9687f
unverified

eltociear commited on Feb 27

fix steps check for anneal on first cycle (#1316)

2c9c88b
unverified

winglian commited on Feb 27

make mlflow optional (#1317)

5894f0e
unverified

winglian commited on Feb 26

multipack for gemma (#1313)

2752d5f
unverified

winglian commited on Feb 22

allow the optimizer prune ratio for ReLoRA to be configurable (#1287)

4b997c3
unverified

winglian commited on Feb 12

Add MPS support (#1264)

fac2d98
unverified

Maxime

winglian commited on Feb 12

simplify haldning for newer multipack patches so they can be added in a single place (#1270)

5698943
unverified

winglian commited on Feb 7

relora: magnitude pruning of the optimizer (#1245)

8c2e05a
unverified

winglian commited on Feb 6

support for true batches with multipack (#1230)

00568c1
unverified

winglian commited on Feb 1

Respect sliding_window=None (#1214)

62ca4a2
unverified

DreamGenX commited on Jan 26

Mixtral fixes 20240124 (#1192) [skip ci]

54d2ac1
unverified

winglian commited on Jan 24

Phi2 multipack (#1173)

814aee6
unverified

winglian commited on Jan 23

Falcon embeddings (#1149) [skip docker]

e799e08
unverified

winglian commited on Jan 23

Qwen2 (#1166)

f5a828a
unverified

winglian commited on Jan 22

Multipack simplify for Mixtral (#1142)

6910e6a
unverified

winglian commited on Jan 18

Add shifted sparse attention (#973) [skip-ci]

1d70f24
unverified

jrc joecummings

winglian commited on Jan 18

optimize calculation of cu_seqlens from position_ids (#1084) [skip ci]

90036eb
unverified

winglian commited on Jan 10

Added chatglm3 conversation type for training models like TinyLLama (#1036)

59b2d30
unverified

xaviviro commited on Jan 4

bump transformers and update attention class map name (#1023)

bcc78d8
unverified

winglian commited on Jan 3

remove landmark attn and xpos rope implementations (#1010)

70b46ca
unverified

winglian commited on Dec 28, 2023

fix mistral prompt assembly (#982)

7bbaac9
unverified

hamel commited on Dec 21, 2023

Fix prompt assembly for llama (#952)

5ada140
unverified

hamel

tokestermw commited on Dec 14, 2023

fix: switch to using the HuggingFace Transformers NEFT implementation (#941)

ef24342
unverified

dg-kalle commited on Dec 13, 2023

Mixtral official (#942)

7fabc4d
unverified

winglian commited on Dec 12, 2023

adds llama and mistral dropout support (#858)

db8a8af
unverified

winglian commited on Nov 15, 2023

various bugfixes (#856)

1470650
unverified

winglian commited on Nov 15, 2023

refactor neft patch to be more re-usable similar to trl's impl (#796)

827ec3d
unverified

winglian commited on Oct 29, 2023

Hotfix for not saving correctly (#762)

32eeeb5
unverified

casperhansen commited on Oct 22, 2023

Implement fused modules (#747)

15d3a65
unverified

casperhansen

winglian commited on Oct 21, 2023

Mistral: Sliding Window Attention with Flash Attention and Sample Packing (#732)

a045db0
unverified

casperhansen

winglian commited on Oct 16, 2023

add noisy embedding (#721)

3bd9528
unverified

Maxime Maxime commited on Oct 13, 2023

flash_attention + sample packing for stablelm 3b (#671)

2d60ba3
unverified

winglian commited on Oct 5, 2023

fix for flash attn w mistral w/o sammple packing (#648)

b2edaae
unverified

winglian commited on Sep 28, 2023

Mistral flash attn packing (#646)

b6ab8aa
unverified

winglian commited on Sep 27, 2023

skip some flash attn patches unless explicitly enabled (#643)

895f0a0
unverified

winglian commited on Sep 27, 2023

use fastchat conversations template (#578)

e7d3e2d
unverified

winglian commited on Sep 27, 2023

update for recent transformers updates (#636)

60c7c48
unverified

winglian commited on Sep 27, 2023

Feat: Add support for upstream FA2 (#626)

19a600a
unverified

Nanobit commited on Sep 26, 2023

btlm and falcon monkey patches for flash attn (#566)

6b9b229
unverified

winglian commited on Sep 17, 2023

Add training callback to send predictions to WandB table (#521)

5b67ea9
unverified

Glavin001 commited on Sep 13, 2023

reorg a bit

fc8766e

tmm1 commited on Sep 5, 2023

use flash_attn rmsnorm when available (#526)

72a6fe1
unverified

tmm1 commited on Sep 4, 2023

use flash_attn xentropy when available (#525)

5fe30b1
unverified

tmm1 commited on Sep 4, 2023

Commit History

Add support for Gemma chat template (#1530) 60f5ce0 unverified

Unsloth gradient checkpointing offload (#1528) 6319da1 unverified

qwen2_moe support w multipack (#1455) 6086be8 unverified

fix some of the edge cases for Jamba (#1452) 05b398a unverified

Remove seq_len arg in rotary_emb (#1443) e07347b unverified

beta support for multipack with gemmoe: (#1402) 8df7b88 unverified

add starcoder2 (#1349) e0f1895 unverified

Update fastchat_conversation_turns.py (#1294) [skip ci] 2b9687f unverified

fix steps check for anneal on first cycle (#1316) 2c9c88b unverified

make mlflow optional (#1317) 5894f0e unverified

multipack for gemma (#1313) 2752d5f unverified

allow the optimizer prune ratio for ReLoRA to be configurable (#1287) 4b997c3 unverified

Add MPS support (#1264) fac2d98 unverified

simplify haldning for newer multipack patches so they can be added in a single place (#1270) 5698943 unverified

relora: magnitude pruning of the optimizer (#1245) 8c2e05a unverified

support for true batches with multipack (#1230) 00568c1 unverified

Respect sliding_window=None (#1214) 62ca4a2 unverified

Mixtral fixes 20240124 (#1192) [skip ci] 54d2ac1 unverified

Phi2 multipack (#1173) 814aee6 unverified

Falcon embeddings (#1149) [skip docker] e799e08 unverified

Qwen2 (#1166) f5a828a unverified

Multipack simplify for Mixtral (#1142) 6910e6a unverified

Add shifted sparse attention (#973) [skip-ci] 1d70f24 unverified

optimize calculation of cu_seqlens from position_ids (#1084) [skip ci] 90036eb unverified

Added chatglm3 conversation type for training models like TinyLLama (#1036) 59b2d30 unverified

bump transformers and update attention class map name (#1023) bcc78d8 unverified

remove landmark attn and xpos rope implementations (#1010) 70b46ca unverified

fix mistral prompt assembly (#982) 7bbaac9 unverified

Fix prompt assembly for llama (#952) 5ada140 unverified

fix: switch to using the HuggingFace Transformers NEFT implementation (#941) ef24342 unverified

Mixtral official (#942) 7fabc4d unverified

adds llama and mistral dropout support (#858) db8a8af unverified

various bugfixes (#856) 1470650 unverified

refactor neft patch to be more re-usable similar to trl's impl (#796) 827ec3d unverified

Hotfix for not saving correctly (#762) 32eeeb5 unverified

Implement fused modules (#747) 15d3a65 unverified

Mistral: Sliding Window Attention with Flash Attention and Sample Packing (#732) a045db0 unverified

add noisy embedding (#721) 3bd9528 unverified

flash_attention + sample packing for stablelm 3b (#671) 2d60ba3 unverified

fix for flash attn w mistral w/o sammple packing (#648) b2edaae unverified

Mistral flash attn packing (#646) b6ab8aa unverified

skip some flash attn patches unless explicitly enabled (#643) 895f0a0 unverified

use fastchat conversations template (#578) e7d3e2d unverified

update for recent transformers updates (#636) 60c7c48 unverified

Feat: Add support for upstream FA2 (#626) 19a600a unverified

btlm and falcon monkey patches for flash attn (#566) 6b9b229 unverified

Add training callback to send predictions to WandB table (#521) 5b67ea9 unverified

reorg a bit fc8766e

use flash_attn rmsnorm when available (#526) 72a6fe1 unverified

use flash_attn xentropy when available (#525) 5fe30b1 unverified

Add support for Gemma chat template (#1530)

60f5ce0
unverified

Unsloth gradient checkpointing offload (#1528)

6319da1
unverified

qwen2_moe support w multipack (#1455)

6086be8
unverified

fix some of the edge cases for Jamba (#1452)

05b398a
unverified

Remove seq_len arg in rotary_emb (#1443)

e07347b
unverified

beta support for multipack with gemmoe: (#1402)

8df7b88
unverified

add starcoder2 (#1349)

e0f1895
unverified

Update fastchat_conversation_turns.py (#1294) [skip ci]

2b9687f
unverified

fix steps check for anneal on first cycle (#1316)

2c9c88b
unverified

make mlflow optional (#1317)

5894f0e
unverified

multipack for gemma (#1313)

2752d5f
unverified

allow the optimizer prune ratio for ReLoRA to be configurable (#1287)

4b997c3
unverified

Add MPS support (#1264)

fac2d98
unverified

simplify haldning for newer multipack patches so they can be added in a single place (#1270)

5698943
unverified

relora: magnitude pruning of the optimizer (#1245)

8c2e05a
unverified

support for true batches with multipack (#1230)

00568c1
unverified

Respect sliding_window=None (#1214)

62ca4a2
unverified

Mixtral fixes 20240124 (#1192) [skip ci]

54d2ac1
unverified

Phi2 multipack (#1173)

814aee6
unverified

Falcon embeddings (#1149) [skip docker]

e799e08
unverified

Qwen2 (#1166)

f5a828a
unverified

Multipack simplify for Mixtral (#1142)

6910e6a
unverified

Add shifted sparse attention (#973) [skip-ci]

1d70f24
unverified

optimize calculation of cu_seqlens from position_ids (#1084) [skip ci]

90036eb
unverified

Added chatglm3 conversation type for training models like TinyLLama (#1036)

59b2d30
unverified

bump transformers and update attention class map name (#1023)

bcc78d8
unverified

remove landmark attn and xpos rope implementations (#1010)

70b46ca
unverified

fix mistral prompt assembly (#982)

7bbaac9
unverified

Fix prompt assembly for llama (#952)

5ada140
unverified

fix: switch to using the HuggingFace Transformers NEFT implementation (#941)

ef24342
unverified

Mixtral official (#942)

7fabc4d
unverified

adds llama and mistral dropout support (#858)

db8a8af
unverified

various bugfixes (#856)

1470650
unverified

refactor neft patch to be more re-usable similar to trl's impl (#796)

827ec3d
unverified

Hotfix for not saving correctly (#762)

32eeeb5
unverified

Implement fused modules (#747)

15d3a65
unverified

Mistral: Sliding Window Attention with Flash Attention and Sample Packing (#732)

a045db0
unverified

add noisy embedding (#721)

3bd9528
unverified

flash_attention + sample packing for stablelm 3b (#671)

2d60ba3
unverified

fix for flash attn w mistral w/o sammple packing (#648)

b2edaae
unverified

Mistral flash attn packing (#646)

b6ab8aa
unverified

skip some flash attn patches unless explicitly enabled (#643)

895f0a0
unverified

use fastchat conversations template (#578)

e7d3e2d
unverified

update for recent transformers updates (#636)

60c7c48
unverified

Feat: Add support for upstream FA2 (#626)

19a600a
unverified

btlm and falcon monkey patches for flash attn (#566)

6b9b229
unverified

Add training callback to send predictions to WandB table (#521)

5b67ea9
unverified

reorg a bit

fc8766e

use flash_attn rmsnorm when available (#526)

72a6fe1
unverified

use flash_attn xentropy when available (#525)

5fe30b1
unverified