Commit History

make sure to use train split if loading from hf
607a4d3

winglian commited on

fix new dataset prompt tokenizers
0f74464

winglian commited on

pygmalion dataset prompts format, cached tokenized datasets should be hashed on the tokenizer too
2809f3f

winglian commited on

tokenization fixes
4ea9a66

winglian commited on

optionally be able to specify alpaca or chat style prompts
1d5ab84

winglian commited on

concise multiple choice and tldr summarize
1365073

winglian commited on

add alpaca multiple choice instruct dataset support
b46bc02

winglian commited on

move filter to before saving so it doesn't happen everytime, update runpod manual script
0d28df0

winglian commited on

whoops, gt vs lt
84c7bc4

winglian commited on

optimize dataloading to use cache, fix model token embedding sizes
aa3c3f9

winglian commited on

black formatting
2bc1a5b

winglian commited on

fix conditional so alpaca doesn't choke
a27d594

winglian commited on

Add CompletionPrompt type
cf68153

Nanobit commited on

Jeopardy bot! (#17)
a12fb0a
unverified

winglian commited on

fix dataset handling, support galactica
4a17a4c

winglian commited on

tweaks to data loading, 8 bit adam, accelerate and deepspeed
097d367

winglian commited on

shuffle and split dataset after save/load
4f2584f

winglian commited on

fix sharegpt handling from hf, don't worry about loading llama if using earlier transformers release
8d43785

winglian commited on

various bugfixes
94f5e41

winglian commited on

WIP large refactor to make finetune script a little more manageable (#3)
6045345
unverified

winglian commited on