Commit History
feat(dataset): add config to keep processed dataset in memory (#1152)
3db5f2f
unverified
Nanobit
commited on
Preprocess dataset size fix (#1131)
7570446
unverified
winglian
commited on
update table for rwkv4 support, fix process count for dataset (#822)
cdc71f7
unverified
winglian
commited on
Correct typos in datasets.py (#639)
d1236f2
unverified
felixonmars
commited on
split completion text to sequence_len (#616)
97d3776
unverified
winglian
commited on
Attention mask and position id fixes for packing (#285)
2bb0b78
unverified
winglian
commited on
feat: use multi-core
45ac7c4
Nanobit
commited on
Fixed pre-commit problems, fixed small bug in logging_config to handle LOG_LEVEL env var
b1f4f7a
theobjectivedad
commited on
Adding logging enhancement
553a86b
theobjectivedad
commited on
pylint for duplicated code for system prompts
7b57ed7
winglian
commited on
add new sharegpt, refactor prompt so it can be customized later, add exception if no data is processed
aac4b76
winglian
commited on
fix packing so that concatenated sequences reset the attention
9b8585d
winglian
commited on
Apply isort then black
37293dc
Nanobit
commited on
Lint datasets
6abb7f6
Nanobit
commited on
Lint and format
392dfd9
Nanobit
commited on
fix new dataset prompt tokenizers
0f74464
winglian
commited on
black formatting
2bc1a5b
winglian
commited on
various bugfixes
94f5e41
winglian
commited on
casts the prepared data to int16 (doesn't help with training memory)
2db9436
winglian
commited on
4bit quantized support (wip)
77fca25
winglian
commited on
various bugfixes
80b2ed2
winglian
commited on
black formatting
a6028d3
winglian
commited on
make it work with pythia in the cloud
8d959a7
winglian
commited on
WIP for axolotl trainer
ce24f5e
winglian
commited on