support custom field for completion from yml (#580) f7a2263 unverified winglian commited on Sep 15, 2023
remove columns after tokenizing for pretraining (#571) 1157950 unverified winglian commited on Sep 14, 2023
Fix pretraining with iterable/streaming Dataset (#556) 2f586d1 unverified Jan Philipp Harries Jan Philipp Harries commited on Sep 13, 2023
support user defined prompters, pretokenized datasets in config, local parquet, local arrow files (#348) d2e7f27 unverified winglian commited on Aug 20, 2023
use context manager to run things on rank0 before others (#397) fc2d6be unverified winglian commited on Aug 15, 2023
Attention mask and position id fixes for packing (#285) 2bb0b78 unverified winglian commited on Aug 12, 2023
experimental llama 2 chat support (#296) 3392270 unverified Jan Philipp Harries Jan Philipp Harries commited on Aug 6, 2023
optimize the iteration when tokenizeing large datasets (#332) fe28543 unverified winglian commited on Aug 4, 2023
Merge pull request #276 from theobjectivedad/logging_enhancement 6f16c45 unverified winglian commited on Jul 16, 2023
Fixed pre-commit problems, fixed small bug in logging_config to handle LOG_LEVEL env var b1f4f7a theobjectivedad commited on Jul 15, 2023
add new sharegpt, refactor prompt so it can be customized later, add exception if no data is processed aac4b76 winglian commited on Jun 11, 2023
new hf_use_auth_token setting so login to hf isn't required 1c33eb8 winglian commited on May 28, 2023
be able to use adam bnb 8bit and one cycle scheduler w fsdp 9493b1b winglian commited on May 22, 2023
Update src/axolotl/utils/data.py for spelling 98a6781 unverified winglian Nanobit commited on May 22, 2023
pygmalion dataset prompts format, cached tokenized datasets should be hashed on the tokenizer too 2809f3f winglian commited on May 21, 2023
move filter to before saving so it doesn't happen everytime, update runpod manual script 0d28df0 winglian commited on May 14, 2023