Commits · Dovakiins/qwerrwe

fix some of the edge cases for Jamba (#1452)

05b398a
unverified

winglian commited on Mar 29, 2024

Remove seq_len arg in rotary_emb (#1443)

e07347b
unverified

wenbopan

winglian commited on Mar 26, 2024

beta support for multipack with gemmoe: (#1402)

8df7b88
unverified

winglian commited on Mar 14, 2024

add starcoder2 (#1349)

e0f1895
unverified

ehartford

winglian

Nanobit commited on Mar 6, 2024

Update fastchat_conversation_turns.py (#1294) [skip ci]

2b9687f
unverified

eltociear commited on Feb 27, 2024

fix steps check for anneal on first cycle (#1316)

2c9c88b
unverified

winglian commited on Feb 27, 2024

make mlflow optional (#1317)

5894f0e
unverified

winglian commited on Feb 26, 2024

multipack for gemma (#1313)

2752d5f
unverified

winglian commited on Feb 22, 2024

allow the optimizer prune ratio for ReLoRA to be configurable (#1287)

4b997c3
unverified

winglian commited on Feb 12, 2024

Add MPS support (#1264)

fac2d98
unverified

Maxime

winglian commited on Feb 12, 2024

simplify haldning for newer multipack patches so they can be added in a single place (#1270)

5698943
unverified

winglian commited on Feb 7, 2024

relora: magnitude pruning of the optimizer (#1245)

8c2e05a
unverified

winglian commited on Feb 6, 2024

support for true batches with multipack (#1230)

00568c1
unverified

winglian commited on Feb 1, 2024

Respect sliding_window=None (#1214)

62ca4a2
unverified

DreamGenX commited on Jan 26, 2024

Mixtral fixes 20240124 (#1192) [skip ci]

54d2ac1
unverified

winglian commited on Jan 24, 2024

Phi2 multipack (#1173)

814aee6
unverified

winglian commited on Jan 23, 2024

Falcon embeddings (#1149) [skip docker]

e799e08
unverified

winglian commited on Jan 23, 2024

Qwen2 (#1166)

f5a828a
unverified

winglian commited on Jan 22, 2024

Multipack simplify for Mixtral (#1142)

6910e6a
unverified

winglian commited on Jan 18, 2024

Add shifted sparse attention (#973) [skip-ci]

1d70f24
unverified

jrc joecummings

winglian commited on Jan 18, 2024

optimize calculation of cu_seqlens from position_ids (#1084) [skip ci]

90036eb
unverified

winglian commited on Jan 10, 2024

Added chatglm3 conversation type for training models like TinyLLama (#1036)

59b2d30
unverified

xaviviro commited on Jan 4, 2024

bump transformers and update attention class map name (#1023)

bcc78d8
unverified

winglian commited on Jan 3, 2024

remove landmark attn and xpos rope implementations (#1010)

70b46ca
unverified

winglian commited on Dec 28, 2023

fix mistral prompt assembly (#982)

7bbaac9
unverified

hamel commited on Dec 21, 2023

Fix prompt assembly for llama (#952)

5ada140
unverified

hamel

tokestermw commited on Dec 14, 2023

fix: switch to using the HuggingFace Transformers NEFT implementation (#941)

ef24342
unverified

kallewoof commited on Dec 13, 2023

Mixtral official (#942)

7fabc4d
unverified

winglian commited on Dec 12, 2023

adds llama and mistral dropout support (#858)

db8a8af
unverified

winglian commited on Nov 15, 2023

various bugfixes (#856)

1470650
unverified

winglian commited on Nov 15, 2023

refactor neft patch to be more re-usable similar to trl's impl (#796)

827ec3d
unverified

winglian commited on Oct 29, 2023

Hotfix for not saving correctly (#762)

32eeeb5
unverified

casperhansen commited on Oct 22, 2023

Implement fused modules (#747)

15d3a65
unverified

casperhansen

winglian commited on Oct 21, 2023

Mistral: Sliding Window Attention with Flash Attention and Sample Packing (#732)

a045db0
unverified

casperhansen

winglian commited on Oct 16, 2023

add noisy embedding (#721)

3bd9528
unverified

Maxime Maxime commited on Oct 13, 2023

flash_attention + sample packing for stablelm 3b (#671)

2d60ba3
unverified

winglian commited on Oct 5, 2023

fix for flash attn w mistral w/o sammple packing (#648)

b2edaae
unverified

winglian commited on Sep 28, 2023

Mistral flash attn packing (#646)

b6ab8aa
unverified

winglian commited on Sep 27, 2023

skip some flash attn patches unless explicitly enabled (#643)

895f0a0
unverified

winglian commited on Sep 27, 2023

use fastchat conversations template (#578)

e7d3e2d
unverified

winglian commited on Sep 27, 2023

update for recent transformers updates (#636)

60c7c48
unverified

winglian commited on Sep 27, 2023

Feat: Add support for upstream FA2 (#626)

19a600a
unverified

Nanobit commited on Sep 26, 2023

btlm and falcon monkey patches for flash attn (#566)

6b9b229
unverified

winglian commited on Sep 17, 2023

Add training callback to send predictions to WandB table (#521)

5b67ea9
unverified

Glavin001 commited on Sep 13, 2023

reorg a bit

fc8766e

tmm1 commited on Sep 5, 2023

use flash_attn rmsnorm when available (#526)

72a6fe1
unverified

tmm1 commited on Sep 4, 2023

use flash_attn xentropy when available (#525)

5fe30b1
unverified

tmm1 commited on Sep 4, 2023

fix checkpints on multigpu (#481)

31f3e71
unverified

winglian commited on Aug 26, 2023

ReLoRA implementation (with quantization) (#322)

bde3c5a
unverified

chargoddard

winglian commited on Aug 24, 2023

fix eval regression caused in 13f7efaf74fcd3c4514277ccb71914c589873f6a

a213d99

tmm1 commited on Aug 21, 2023

Commit History

fix some of the edge cases for Jamba (#1452) 05b398a unverified

Remove seq_len arg in rotary_emb (#1443) e07347b unverified

beta support for multipack with gemmoe: (#1402) 8df7b88 unverified

add starcoder2 (#1349) e0f1895 unverified

Update fastchat_conversation_turns.py (#1294) [skip ci] 2b9687f unverified

fix steps check for anneal on first cycle (#1316) 2c9c88b unverified

make mlflow optional (#1317) 5894f0e unverified

multipack for gemma (#1313) 2752d5f unverified

allow the optimizer prune ratio for ReLoRA to be configurable (#1287) 4b997c3 unverified

Add MPS support (#1264) fac2d98 unverified

simplify haldning for newer multipack patches so they can be added in a single place (#1270) 5698943 unverified

relora: magnitude pruning of the optimizer (#1245) 8c2e05a unverified

support for true batches with multipack (#1230) 00568c1 unverified

Respect sliding_window=None (#1214) 62ca4a2 unverified

Mixtral fixes 20240124 (#1192) [skip ci] 54d2ac1 unverified

Phi2 multipack (#1173) 814aee6 unverified

Falcon embeddings (#1149) [skip docker] e799e08 unverified

Qwen2 (#1166) f5a828a unverified

Multipack simplify for Mixtral (#1142) 6910e6a unverified

Add shifted sparse attention (#973) [skip-ci] 1d70f24 unverified

optimize calculation of cu_seqlens from position_ids (#1084) [skip ci] 90036eb unverified

Added chatglm3 conversation type for training models like TinyLLama (#1036) 59b2d30 unverified

bump transformers and update attention class map name (#1023) bcc78d8 unverified

remove landmark attn and xpos rope implementations (#1010) 70b46ca unverified

fix mistral prompt assembly (#982) 7bbaac9 unverified

Fix prompt assembly for llama (#952) 5ada140 unverified

fix: switch to using the HuggingFace Transformers NEFT implementation (#941) ef24342 unverified

Mixtral official (#942) 7fabc4d unverified

adds llama and mistral dropout support (#858) db8a8af unverified

various bugfixes (#856) 1470650 unverified

refactor neft patch to be more re-usable similar to trl's impl (#796) 827ec3d unverified

Hotfix for not saving correctly (#762) 32eeeb5 unverified

Implement fused modules (#747) 15d3a65 unverified

Mistral: Sliding Window Attention with Flash Attention and Sample Packing (#732) a045db0 unverified

add noisy embedding (#721) 3bd9528 unverified

flash_attention + sample packing for stablelm 3b (#671) 2d60ba3 unverified

fix for flash attn w mistral w/o sammple packing (#648) b2edaae unverified

Mistral flash attn packing (#646) b6ab8aa unverified

skip some flash attn patches unless explicitly enabled (#643) 895f0a0 unverified

use fastchat conversations template (#578) e7d3e2d unverified

update for recent transformers updates (#636) 60c7c48 unverified

Feat: Add support for upstream FA2 (#626) 19a600a unverified

btlm and falcon monkey patches for flash attn (#566) 6b9b229 unverified

Add training callback to send predictions to WandB table (#521) 5b67ea9 unverified

reorg a bit fc8766e

use flash_attn rmsnorm when available (#526) 72a6fe1 unverified

use flash_attn xentropy when available (#525) 5fe30b1 unverified

fix checkpints on multigpu (#481) 31f3e71 unverified

ReLoRA implementation (with quantization) (#322) bde3c5a unverified

fix eval regression caused in 13f7efaf74fcd3c4514277ccb71914c589873f6a a213d99

fix some of the edge cases for Jamba (#1452)

05b398a
unverified

Remove seq_len arg in rotary_emb (#1443)

e07347b
unverified

beta support for multipack with gemmoe: (#1402)

8df7b88
unverified

add starcoder2 (#1349)

e0f1895
unverified

Update fastchat_conversation_turns.py (#1294) [skip ci]

2b9687f
unverified

fix steps check for anneal on first cycle (#1316)

2c9c88b
unverified

make mlflow optional (#1317)

5894f0e
unverified

multipack for gemma (#1313)

2752d5f
unverified

allow the optimizer prune ratio for ReLoRA to be configurable (#1287)

4b997c3
unverified

Add MPS support (#1264)

fac2d98
unverified

simplify haldning for newer multipack patches so they can be added in a single place (#1270)

5698943
unverified

relora: magnitude pruning of the optimizer (#1245)

8c2e05a
unverified

support for true batches with multipack (#1230)

00568c1
unverified

Respect sliding_window=None (#1214)

62ca4a2
unverified

Mixtral fixes 20240124 (#1192) [skip ci]

54d2ac1
unverified

Phi2 multipack (#1173)

814aee6
unverified

Falcon embeddings (#1149) [skip docker]

e799e08
unverified

Qwen2 (#1166)

f5a828a
unverified

Multipack simplify for Mixtral (#1142)

6910e6a
unverified

Add shifted sparse attention (#973) [skip-ci]

1d70f24
unverified

optimize calculation of cu_seqlens from position_ids (#1084) [skip ci]

90036eb
unverified

Added chatglm3 conversation type for training models like TinyLLama (#1036)

59b2d30
unverified

bump transformers and update attention class map name (#1023)

bcc78d8
unverified

remove landmark attn and xpos rope implementations (#1010)

70b46ca
unverified

fix mistral prompt assembly (#982)

7bbaac9
unverified

Fix prompt assembly for llama (#952)

5ada140
unverified

fix: switch to using the HuggingFace Transformers NEFT implementation (#941)

ef24342
unverified

Mixtral official (#942)

7fabc4d
unverified

adds llama and mistral dropout support (#858)

db8a8af
unverified

various bugfixes (#856)

1470650
unverified

refactor neft patch to be more re-usable similar to trl's impl (#796)

827ec3d
unverified

Hotfix for not saving correctly (#762)

32eeeb5
unverified

Implement fused modules (#747)

15d3a65
unverified

Mistral: Sliding Window Attention with Flash Attention and Sample Packing (#732)

a045db0
unverified

add noisy embedding (#721)

3bd9528
unverified

flash_attention + sample packing for stablelm 3b (#671)

2d60ba3
unverified

fix for flash attn w mistral w/o sammple packing (#648)

b2edaae
unverified

Mistral flash attn packing (#646)

b6ab8aa
unverified

skip some flash attn patches unless explicitly enabled (#643)

895f0a0
unverified

use fastchat conversations template (#578)

e7d3e2d
unverified

update for recent transformers updates (#636)

60c7c48
unverified

Feat: Add support for upstream FA2 (#626)

19a600a
unverified

btlm and falcon monkey patches for flash attn (#566)

6b9b229
unverified

Add training callback to send predictions to WandB table (#521)

5b67ea9
unverified

reorg a bit

fc8766e

use flash_attn rmsnorm when available (#526)

72a6fe1
unverified

use flash_attn xentropy when available (#525)

5fe30b1
unverified

fix checkpints on multigpu (#481)

31f3e71
unverified

ReLoRA implementation (with quantization) (#322)

bde3c5a
unverified

fix eval regression caused in 13f7efaf74fcd3c4514277ccb71914c589873f6a

a213d99