Commits · Dovakiins/qwerrwe

beta support for multipack with gemmoe: (#1402)

8df7b88
unverified

winglian commited on Mar 14, 2024

add starcoder2 (#1349)

e0f1895
unverified

ehartford

winglian

Nanobit commited on Mar 6, 2024

Update fastchat_conversation_turns.py (#1294) [skip ci]

2b9687f
unverified

eltociear commited on Feb 27, 2024

fix steps check for anneal on first cycle (#1316)

2c9c88b
unverified

winglian commited on Feb 27, 2024

make mlflow optional (#1317)

5894f0e
unverified

winglian commited on Feb 26, 2024

multipack for gemma (#1313)

2752d5f
unverified

winglian commited on Feb 22, 2024

allow the optimizer prune ratio for ReLoRA to be configurable (#1287)

4b997c3
unverified

winglian commited on Feb 12, 2024

Add MPS support (#1264)

fac2d98
unverified

Maxime

winglian commited on Feb 12, 2024

simplify haldning for newer multipack patches so they can be added in a single place (#1270)

5698943
unverified

winglian commited on Feb 7, 2024

relora: magnitude pruning of the optimizer (#1245)

8c2e05a
unverified

winglian commited on Feb 6, 2024

support for true batches with multipack (#1230)

00568c1
unverified

winglian commited on Feb 1, 2024

Respect sliding_window=None (#1214)

62ca4a2
unverified

DreamGenX commited on Jan 26, 2024

Mixtral fixes 20240124 (#1192) [skip ci]

54d2ac1
unverified

winglian commited on Jan 24, 2024

Phi2 multipack (#1173)

814aee6
unverified

winglian commited on Jan 23, 2024

Falcon embeddings (#1149) [skip docker]

e799e08
unverified

winglian commited on Jan 23, 2024

Qwen2 (#1166)

f5a828a
unverified

winglian commited on Jan 22, 2024

Multipack simplify for Mixtral (#1142)

6910e6a
unverified

winglian commited on Jan 18, 2024

Add shifted sparse attention (#973) [skip-ci]

1d70f24
unverified

jrc joecummings

winglian commited on Jan 18, 2024

optimize calculation of cu_seqlens from position_ids (#1084) [skip ci]

90036eb
unverified

winglian commited on Jan 10, 2024

Added chatglm3 conversation type for training models like TinyLLama (#1036)

59b2d30
unverified

xaviviro commited on Jan 4, 2024

bump transformers and update attention class map name (#1023)

bcc78d8
unverified

winglian commited on Jan 3, 2024

remove landmark attn and xpos rope implementations (#1010)

70b46ca
unverified

winglian commited on Dec 28, 2023

fix mistral prompt assembly (#982)

7bbaac9
unverified

hamel commited on Dec 21, 2023

Fix prompt assembly for llama (#952)

5ada140
unverified

hamel

tokestermw commited on Dec 14, 2023

fix: switch to using the HuggingFace Transformers NEFT implementation (#941)

ef24342
unverified

kallewoof commited on Dec 13, 2023

Mixtral official (#942)

7fabc4d
unverified

winglian commited on Dec 12, 2023

adds llama and mistral dropout support (#858)

db8a8af
unverified

winglian commited on Nov 15, 2023

various bugfixes (#856)

1470650
unverified

winglian commited on Nov 15, 2023

refactor neft patch to be more re-usable similar to trl's impl (#796)

827ec3d
unverified

winglian commited on Oct 29, 2023

Hotfix for not saving correctly (#762)

32eeeb5
unverified

casperhansen commited on Oct 22, 2023

Implement fused modules (#747)

15d3a65
unverified

casperhansen

winglian commited on Oct 21, 2023

Mistral: Sliding Window Attention with Flash Attention and Sample Packing (#732)

a045db0
unverified

casperhansen

winglian commited on Oct 16, 2023

add noisy embedding (#721)

3bd9528
unverified

Maxime Maxime commited on Oct 13, 2023

flash_attention + sample packing for stablelm 3b (#671)

2d60ba3
unverified

winglian commited on Oct 5, 2023

fix for flash attn w mistral w/o sammple packing (#648)

b2edaae
unverified

winglian commited on Sep 28, 2023

Mistral flash attn packing (#646)

b6ab8aa
unverified

winglian commited on Sep 27, 2023

skip some flash attn patches unless explicitly enabled (#643)

895f0a0
unverified

winglian commited on Sep 27, 2023

use fastchat conversations template (#578)

e7d3e2d
unverified

winglian commited on Sep 27, 2023

update for recent transformers updates (#636)

60c7c48
unverified

winglian commited on Sep 27, 2023

Feat: Add support for upstream FA2 (#626)

19a600a
unverified

Nanobit commited on Sep 26, 2023

btlm and falcon monkey patches for flash attn (#566)

6b9b229
unverified

winglian commited on Sep 17, 2023

Add training callback to send predictions to WandB table (#521)

5b67ea9
unverified

Glavin001 commited on Sep 13, 2023

reorg a bit

fc8766e

tmm1 commited on Sep 5, 2023

use flash_attn rmsnorm when available (#526)

72a6fe1
unverified

tmm1 commited on Sep 4, 2023

use flash_attn xentropy when available (#525)

5fe30b1
unverified

tmm1 commited on Sep 4, 2023

fix checkpints on multigpu (#481)

31f3e71
unverified

winglian commited on Aug 26, 2023

ReLoRA implementation (with quantization) (#322)

bde3c5a
unverified

chargoddard

winglian commited on Aug 24, 2023

fix eval regression caused in 13f7efaf74fcd3c4514277ccb71914c589873f6a

a213d99

tmm1 commited on Aug 21, 2023

is_causal fix for evals?

fbf49a4

winglian commited on Aug 21, 2023

fix evals (#447)

ee26281
unverified

winglian commited on Aug 21, 2023

Commit History

beta support for multipack with gemmoe: (#1402) 8df7b88 unverified

add starcoder2 (#1349) e0f1895 unverified

Update fastchat_conversation_turns.py (#1294) [skip ci] 2b9687f unverified

fix steps check for anneal on first cycle (#1316) 2c9c88b unverified

make mlflow optional (#1317) 5894f0e unverified

multipack for gemma (#1313) 2752d5f unverified

allow the optimizer prune ratio for ReLoRA to be configurable (#1287) 4b997c3 unverified

Add MPS support (#1264) fac2d98 unverified

simplify haldning for newer multipack patches so they can be added in a single place (#1270) 5698943 unverified

relora: magnitude pruning of the optimizer (#1245) 8c2e05a unverified

support for true batches with multipack (#1230) 00568c1 unverified

Respect sliding_window=None (#1214) 62ca4a2 unverified

Mixtral fixes 20240124 (#1192) [skip ci] 54d2ac1 unverified

Phi2 multipack (#1173) 814aee6 unverified

Falcon embeddings (#1149) [skip docker] e799e08 unverified

Qwen2 (#1166) f5a828a unverified

Multipack simplify for Mixtral (#1142) 6910e6a unverified

Add shifted sparse attention (#973) [skip-ci] 1d70f24 unverified

optimize calculation of cu_seqlens from position_ids (#1084) [skip ci] 90036eb unverified

Added chatglm3 conversation type for training models like TinyLLama (#1036) 59b2d30 unverified

bump transformers and update attention class map name (#1023) bcc78d8 unverified

remove landmark attn and xpos rope implementations (#1010) 70b46ca unverified

fix mistral prompt assembly (#982) 7bbaac9 unverified

Fix prompt assembly for llama (#952) 5ada140 unverified

fix: switch to using the HuggingFace Transformers NEFT implementation (#941) ef24342 unverified

Mixtral official (#942) 7fabc4d unverified

adds llama and mistral dropout support (#858) db8a8af unverified

various bugfixes (#856) 1470650 unverified

refactor neft patch to be more re-usable similar to trl's impl (#796) 827ec3d unverified

Hotfix for not saving correctly (#762) 32eeeb5 unverified

Implement fused modules (#747) 15d3a65 unverified

Mistral: Sliding Window Attention with Flash Attention and Sample Packing (#732) a045db0 unverified

add noisy embedding (#721) 3bd9528 unverified

flash_attention + sample packing for stablelm 3b (#671) 2d60ba3 unverified

fix for flash attn w mistral w/o sammple packing (#648) b2edaae unverified

Mistral flash attn packing (#646) b6ab8aa unverified

skip some flash attn patches unless explicitly enabled (#643) 895f0a0 unverified

use fastchat conversations template (#578) e7d3e2d unverified

update for recent transformers updates (#636) 60c7c48 unverified

Feat: Add support for upstream FA2 (#626) 19a600a unverified

btlm and falcon monkey patches for flash attn (#566) 6b9b229 unverified

Add training callback to send predictions to WandB table (#521) 5b67ea9 unverified

reorg a bit fc8766e

use flash_attn rmsnorm when available (#526) 72a6fe1 unverified

use flash_attn xentropy when available (#525) 5fe30b1 unverified

fix checkpints on multigpu (#481) 31f3e71 unverified

ReLoRA implementation (with quantization) (#322) bde3c5a unverified

fix eval regression caused in 13f7efaf74fcd3c4514277ccb71914c589873f6a a213d99

is_causal fix for evals? fbf49a4

fix evals (#447) ee26281 unverified

beta support for multipack with gemmoe: (#1402)

8df7b88
unverified

add starcoder2 (#1349)

e0f1895
unverified

Update fastchat_conversation_turns.py (#1294) [skip ci]

2b9687f
unverified

fix steps check for anneal on first cycle (#1316)

2c9c88b
unverified

make mlflow optional (#1317)

5894f0e
unverified

multipack for gemma (#1313)

2752d5f
unverified

allow the optimizer prune ratio for ReLoRA to be configurable (#1287)

4b997c3
unverified

Add MPS support (#1264)

fac2d98
unverified

simplify haldning for newer multipack patches so they can be added in a single place (#1270)

5698943
unverified

relora: magnitude pruning of the optimizer (#1245)

8c2e05a
unverified

support for true batches with multipack (#1230)

00568c1
unverified

Respect sliding_window=None (#1214)

62ca4a2
unverified

Mixtral fixes 20240124 (#1192) [skip ci]

54d2ac1
unverified

Phi2 multipack (#1173)

814aee6
unverified

Falcon embeddings (#1149) [skip docker]

e799e08
unverified

Qwen2 (#1166)

f5a828a
unverified

Multipack simplify for Mixtral (#1142)

6910e6a
unverified

Add shifted sparse attention (#973) [skip-ci]

1d70f24
unverified

optimize calculation of cu_seqlens from position_ids (#1084) [skip ci]

90036eb
unverified

Added chatglm3 conversation type for training models like TinyLLama (#1036)

59b2d30
unverified

bump transformers and update attention class map name (#1023)

bcc78d8
unverified

remove landmark attn and xpos rope implementations (#1010)

70b46ca
unverified

fix mistral prompt assembly (#982)

7bbaac9
unverified

Fix prompt assembly for llama (#952)

5ada140
unverified

fix: switch to using the HuggingFace Transformers NEFT implementation (#941)

ef24342
unverified

Mixtral official (#942)

7fabc4d
unverified

adds llama and mistral dropout support (#858)

db8a8af
unverified

various bugfixes (#856)

1470650
unverified

refactor neft patch to be more re-usable similar to trl's impl (#796)

827ec3d
unverified

Hotfix for not saving correctly (#762)

32eeeb5
unverified

Implement fused modules (#747)

15d3a65
unverified

Mistral: Sliding Window Attention with Flash Attention and Sample Packing (#732)

a045db0
unverified

add noisy embedding (#721)

3bd9528
unverified

flash_attention + sample packing for stablelm 3b (#671)

2d60ba3
unverified

fix for flash attn w mistral w/o sammple packing (#648)

b2edaae
unverified

Mistral flash attn packing (#646)

b6ab8aa
unverified

skip some flash attn patches unless explicitly enabled (#643)

895f0a0
unverified

use fastchat conversations template (#578)

e7d3e2d
unverified

update for recent transformers updates (#636)

60c7c48
unverified

Feat: Add support for upstream FA2 (#626)

19a600a
unverified

btlm and falcon monkey patches for flash attn (#566)

6b9b229
unverified

Add training callback to send predictions to WandB table (#521)

5b67ea9
unverified

reorg a bit

fc8766e

use flash_attn rmsnorm when available (#526)

72a6fe1
unverified

use flash_attn xentropy when available (#525)

5fe30b1
unverified

fix checkpints on multigpu (#481)

31f3e71
unverified

ReLoRA implementation (with quantization) (#322)

bde3c5a
unverified

fix eval regression caused in 13f7efaf74fcd3c4514277ccb71914c589873f6a

a213d99

is_causal fix for evals?

fbf49a4

fix evals (#447)

ee26281
unverified