Commits · Dovakiins/qwerrwe

Implement fused modules (#747)

15d3a65
unverified

casperhansen

winglian commited on Oct 21, 2023

refactor to set eval_batch_size earlier if unset, so we can warn if mismatched (#662)

2642cae
unverified

winglian commited on Oct 3, 2023

Make dataset_processes configurable (#651)

9ec2077
unverified

corbt commited on Sep 29, 2023

Fix bug when using pretokenized datasets (#652)

590d603
unverified

ich commited on Sep 29, 2023

Feat: Add example for Mistral (#644)

eb41f76
unverified

Nanobit commited on Sep 28, 2023

Fix(cfg): Add validation for save_strategy and eval_strategy (#633)

383f88d
unverified

Nanobit commited on Sep 28, 2023

use fastchat conversations template (#578)

e7d3e2d
unverified

winglian commited on Sep 27, 2023

Feat: Add support for upstream FA2 (#626)

19a600a
unverified

Nanobit commited on Sep 26, 2023

Fix: Fail bf16 check when running on cpu during merge (#631)

cfbce02
unverified

Nanobit commited on Sep 25, 2023

add bf16 check (#587)

131afdb
unverified

winglian commited on Sep 17, 2023

make phi training work with Loras (#588)

62eaee7
unverified

winglian commited on Sep 16, 2023

E2e device cuda (#575)

2414673
unverified

winglian commited on Sep 15, 2023

Model parallel (#538)

f6060a6
unverified

winglian commited on Sep 13, 2023

Add training callback to send predictions to WandB table (#521)

5b67ea9
unverified

Glavin001 commited on Sep 13, 2023

Fix pretraining with iterable/streaming Dataset (#556)

2f586d1
unverified

Jan Philipp Harries Jan Philipp Harries commited on Sep 13, 2023

Early stopping metric (#537)

e30f1e3
unverified

winglian commited on Sep 8, 2023

recommend padding when using sample packing (#531)

3437149
unverified

winglian commited on Sep 6, 2023

Add support for GPTQ using native transformers/peft (#468)

3355706
unverified

winglian commited on Sep 5, 2023

move is_llama_derived_model into normalize_config (#524)

44454ae
unverified

tmm1 commited on Sep 4, 2023

ReLoRA implementation (with quantization) (#322)

bde3c5a
unverified

chargoddard

winglian commited on Aug 24, 2023

recast loralayer, norm, lmhead + embed token weights per original qlora (#393)

96deb6b
unverified

winglian commited on Aug 21, 2023

Fix(config): Update handling of deepspeed config (#404)

c01015f
unverified

Nanobit commited on Aug 15, 2023

try to detect accelerate and only use device_map=None in that case (#373)

094fc2c
unverified

tmm1 commited on Aug 13, 2023

improve GPU logging to break out pytorch cache and system mem

7b55fe6

tmm1 commited on Aug 13, 2023

extract module for working with cfg

8cec513

tmm1 commited on Aug 13, 2023

Spaces:

Dovakiins
/

qwerrwe

Build error

Commit History

Implement fused modules (#747)

15d3a65
unverified

refactor to set eval_batch_size earlier if unset, so we can warn if mismatched (#662)

2642cae
unverified

Make dataset_processes configurable (#651)

9ec2077
unverified

Fix bug when using pretokenized datasets (#652)

590d603
unverified

Feat: Add example for Mistral (#644)

eb41f76
unverified

Fix(cfg): Add validation for save_strategy and eval_strategy (#633)

383f88d
unverified

use fastchat conversations template (#578)

e7d3e2d
unverified

Feat: Add support for upstream FA2 (#626)

19a600a
unverified

Fix: Fail bf16 check when running on cpu during merge (#631)

cfbce02
unverified

add bf16 check (#587)

131afdb
unverified

make phi training work with Loras (#588)

62eaee7
unverified

E2e device cuda (#575)

2414673
unverified

Model parallel (#538)

f6060a6
unverified

Add training callback to send predictions to WandB table (#521)

5b67ea9
unverified

Fix pretraining with iterable/streaming Dataset (#556)

2f586d1
unverified

Early stopping metric (#537)

e30f1e3
unverified

recommend padding when using sample packing (#531)

3437149
unverified

Add support for GPTQ using native transformers/peft (#468)

3355706
unverified

move is_llama_derived_model into normalize_config (#524)

44454ae
unverified

ReLoRA implementation (with quantization) (#322)

bde3c5a
unverified

recast loralayer, norm, lmhead + embed token weights per original qlora (#393)

96deb6b
unverified

Fix(config): Update handling of deepspeed config (#404)

c01015f
unverified

try to detect accelerate and only use device_map=None in that case (#373)

094fc2c
unverified

improve GPU logging to break out pytorch cache and system mem

7b55fe6

extract module for working with cfg

8cec513

Commit History

Implement fused modules (#747) 15d3a65 unverified

refactor to set eval_batch_size earlier if unset, so we can warn if mismatched (#662) 2642cae unverified

Make dataset_processes configurable (#651) 9ec2077 unverified

Fix bug when using pretokenized datasets (#652) 590d603 unverified

Feat: Add example for Mistral (#644) eb41f76 unverified

Fix(cfg): Add validation for save_strategy and eval_strategy (#633) 383f88d unverified

use fastchat conversations template (#578) e7d3e2d unverified

Feat: Add support for upstream FA2 (#626) 19a600a unverified

Fix: Fail bf16 check when running on cpu during merge (#631) cfbce02 unverified

add bf16 check (#587) 131afdb unverified

make phi training work with Loras (#588) 62eaee7 unverified

E2e device cuda (#575) 2414673 unverified

Model parallel (#538) f6060a6 unverified

Add training callback to send predictions to WandB table (#521) 5b67ea9 unverified

Fix pretraining with iterable/streaming Dataset (#556) 2f586d1 unverified

Early stopping metric (#537) e30f1e3 unverified

recommend padding when using sample packing (#531) 3437149 unverified

Add support for GPTQ using native transformers/peft (#468) 3355706 unverified

move is_llama_derived_model into normalize_config (#524) 44454ae unverified

ReLoRA implementation (with quantization) (#322) bde3c5a unverified

recast loralayer, norm, lmhead + embed token weights per original qlora (#393) 96deb6b unverified

Fix(config): Update handling of deepspeed config (#404) c01015f unverified

try to detect accelerate and only use device_map=None in that case (#373) 094fc2c unverified

improve GPU logging to break out pytorch cache and system mem 7b55fe6

extract module for working with cfg 8cec513

Implement fused modules (#747)

15d3a65
unverified

refactor to set eval_batch_size earlier if unset, so we can warn if mismatched (#662)

2642cae
unverified

Make dataset_processes configurable (#651)

9ec2077
unverified

Fix bug when using pretokenized datasets (#652)

590d603
unverified

Feat: Add example for Mistral (#644)

eb41f76
unverified

Fix(cfg): Add validation for save_strategy and eval_strategy (#633)

383f88d
unverified

use fastchat conversations template (#578)

e7d3e2d
unverified

Feat: Add support for upstream FA2 (#626)

19a600a
unverified

Fix: Fail bf16 check when running on cpu during merge (#631)

cfbce02
unverified

add bf16 check (#587)

131afdb
unverified

make phi training work with Loras (#588)

62eaee7
unverified

E2e device cuda (#575)

2414673
unverified

Model parallel (#538)

f6060a6
unverified

Add training callback to send predictions to WandB table (#521)

5b67ea9
unverified

Fix pretraining with iterable/streaming Dataset (#556)

2f586d1
unverified

Early stopping metric (#537)

e30f1e3
unverified

recommend padding when using sample packing (#531)

3437149
unverified

Add support for GPTQ using native transformers/peft (#468)

3355706
unverified

move is_llama_derived_model into normalize_config (#524)

44454ae
unverified

ReLoRA implementation (with quantization) (#322)

bde3c5a
unverified

recast loralayer, norm, lmhead + embed token weights per original qlora (#393)

96deb6b
unverified

Fix(config): Update handling of deepspeed config (#404)

c01015f
unverified

try to detect accelerate and only use device_map=None in that case (#373)

094fc2c
unverified

improve GPU logging to break out pytorch cache and system mem

7b55fe6

extract module for working with cfg

8cec513