Commits · Dovakiins/qwerrwe

make sure to use train split if loading from hf

607a4d3

winglian commited on May 22, 2023

fix new dataset prompt tokenizers

0f74464

winglian commited on May 21, 2023

pygmalion dataset prompts format, cached tokenized datasets should be hashed on the tokenizer too

2809f3f

winglian commited on May 21, 2023

tokenization fixes

4ea9a66

winglian commited on May 21, 2023

optionally be able to specify alpaca or chat style prompts

1d5ab84

winglian commited on May 20, 2023

concise multiple choice and tldr summarize

1365073

winglian commited on May 17, 2023

add alpaca multiple choice instruct dataset support

b46bc02

winglian commited on May 17, 2023

move filter to before saving so it doesn't happen everytime, update runpod manual script

0d28df0

winglian commited on May 14, 2023

whoops, gt vs lt

84c7bc4

winglian commited on May 12, 2023

optimize dataloading to use cache, fix model token embedding sizes

aa3c3f9

winglian commited on May 12, 2023

black formatting

2bc1a5b

winglian commited on May 10, 2023

fix conditional so alpaca doesn't choke

a27d594

winglian commited on May 10, 2023

Add CompletionPrompt type

cf68153

Nanobit commited on May 8, 2023

Jeopardy bot! (#17)

a12fb0a
unverified

winglian commited on May 8, 2023

fix dataset handling, support galactica

4a17a4c

winglian commited on Apr 24, 2023

tweaks to data loading, 8 bit adam, accelerate and deepspeed

097d367

winglian commited on Apr 22, 2023

shuffle and split dataset after save/load

4f2584f

winglian commited on Apr 20, 2023

fix sharegpt handling from hf, don't worry about loading llama if using earlier transformers release

8d43785

winglian commited on Apr 20, 2023

various bugfixes

94f5e41

winglian commited on Apr 19, 2023

WIP large refactor to make finetune script a little more manageable (#3)

6045345
unverified

winglian commited on Apr 18, 2023

Spaces:

Dovakiins
/

qwerrwe

Build error

Commit History

make sure to use train split if loading from hf

607a4d3

fix new dataset prompt tokenizers

0f74464

pygmalion dataset prompts format, cached tokenized datasets should be hashed on the tokenizer too

2809f3f

tokenization fixes

4ea9a66

optionally be able to specify alpaca or chat style prompts

1d5ab84

concise multiple choice and tldr summarize

1365073

add alpaca multiple choice instruct dataset support

b46bc02

move filter to before saving so it doesn't happen everytime, update runpod manual script

0d28df0

whoops, gt vs lt

84c7bc4

optimize dataloading to use cache, fix model token embedding sizes

aa3c3f9

black formatting

2bc1a5b

fix conditional so alpaca doesn't choke

a27d594

Add CompletionPrompt type

cf68153

Jeopardy bot! (#17)

a12fb0a
unverified

fix dataset handling, support galactica

4a17a4c

tweaks to data loading, 8 bit adam, accelerate and deepspeed

097d367

shuffle and split dataset after save/load

4f2584f

fix sharegpt handling from hf, don't worry about loading llama if using earlier transformers release

8d43785

various bugfixes

94f5e41

WIP large refactor to make finetune script a little more manageable (#3)

6045345
unverified

Commit History

make sure to use train split if loading from hf 607a4d3

fix new dataset prompt tokenizers 0f74464

pygmalion dataset prompts format, cached tokenized datasets should be hashed on the tokenizer too 2809f3f

tokenization fixes 4ea9a66

optionally be able to specify alpaca or chat style prompts 1d5ab84

concise multiple choice and tldr summarize 1365073

add alpaca multiple choice instruct dataset support b46bc02

move filter to before saving so it doesn't happen everytime, update runpod manual script 0d28df0

whoops, gt vs lt 84c7bc4

optimize dataloading to use cache, fix model token embedding sizes aa3c3f9

black formatting 2bc1a5b

fix conditional so alpaca doesn't choke a27d594

Add CompletionPrompt type cf68153

Jeopardy bot! (#17) a12fb0a unverified

fix dataset handling, support galactica 4a17a4c

tweaks to data loading, 8 bit adam, accelerate and deepspeed 097d367

shuffle and split dataset after save/load 4f2584f

fix sharegpt handling from hf, don't worry about loading llama if using earlier transformers release 8d43785

various bugfixes 94f5e41

WIP large refactor to make finetune script a little more manageable (#3) 6045345 unverified

make sure to use train split if loading from hf

607a4d3

fix new dataset prompt tokenizers

0f74464

pygmalion dataset prompts format, cached tokenized datasets should be hashed on the tokenizer too

2809f3f

tokenization fixes

4ea9a66

optionally be able to specify alpaca or chat style prompts

1d5ab84

concise multiple choice and tldr summarize

1365073

add alpaca multiple choice instruct dataset support

b46bc02

move filter to before saving so it doesn't happen everytime, update runpod manual script

0d28df0

whoops, gt vs lt

84c7bc4

optimize dataloading to use cache, fix model token embedding sizes

aa3c3f9

black formatting

2bc1a5b

fix conditional so alpaca doesn't choke

a27d594

Add CompletionPrompt type

cf68153

Jeopardy bot! (#17)

a12fb0a
unverified

fix dataset handling, support galactica

4a17a4c

tweaks to data loading, 8 bit adam, accelerate and deepspeed

097d367

shuffle and split dataset after save/load

4f2584f

fix sharegpt handling from hf, don't worry about loading llama if using earlier transformers release

8d43785

various bugfixes

94f5e41

WIP large refactor to make finetune script a little more manageable (#3)

6045345
unverified