Update data.py for signature generation (#851) 48630f5 unverified MilesQLi winglian commited on Nov 15, 2023
update table for rwkv4 support, fix process count for dataset (#822) cdc71f7 unverified winglian commited on Nov 5, 2023
catch ConnectionError when checking dataset from HuggingFace (#743) 992d57f unverified Napuh commited on Oct 19, 2023
improve handling of the prepared ds path and other cfg defaults (#701) 1c412c7 unverified winglian commited on Oct 13, 2023
Fix: Future deprecation warning with use_auth_token (#680) 69fac9a unverified Nanobit commited on Oct 5, 2023
prepared dataset caching, other misc fixes (#665) e50a64e unverified winglian commited on Oct 3, 2023
Feat(data): Allow loading local csv and text (#594) 00dce35 unverified Nanobit commited on Sep 17, 2023
support custom field for completion from yml (#580) f7a2263 unverified winglian commited on Sep 15, 2023
remove columns after tokenizing for pretraining (#571) 1157950 unverified winglian commited on Sep 14, 2023
Fix pretraining with iterable/streaming Dataset (#556) 2f586d1 unverified Jan Philipp Harries Jan Philipp Harries commited on Sep 13, 2023
support user defined prompters, pretokenized datasets in config, local parquet, local arrow files (#348) d2e7f27 unverified winglian commited on Aug 20, 2023
use context manager to run things on rank0 before others (#397) fc2d6be unverified winglian commited on Aug 15, 2023
Attention mask and position id fixes for packing (#285) 2bb0b78 unverified winglian commited on Aug 12, 2023
experimental llama 2 chat support (#296) 3392270 unverified Jan Philipp Harries Jan Philipp Harries commited on Aug 6, 2023
optimize the iteration when tokenizeing large datasets (#332) fe28543 unverified winglian commited on Aug 4, 2023
Merge pull request #276 from theobjectivedad/logging_enhancement 6f16c45 unverified winglian commited on Jul 16, 2023
Fixed pre-commit problems, fixed small bug in logging_config to handle LOG_LEVEL env var b1f4f7a theobjectivedad commited on Jul 15, 2023
add new sharegpt, refactor prompt so it can be customized later, add exception if no data is processed aac4b76 winglian commited on Jun 11, 2023