Fix pretraining with iterable/streaming Dataset (#556) 2f586d1 unverified Jan Philipp Harries Jan Philipp Harries commited on Sep 13, 2023
recommend padding when using sample packing (#531) 3437149 unverified winglian commited on Sep 6, 2023
fix test fixture b/c hf trainer tokenization changed (#464) d5dcf9c unverified winglian commited on Aug 23, 2023
fix fixture for new tokenizer handling in transformers (#428) 8cace80 unverified winglian commited on Aug 17, 2023
Attention mask and position id fixes for packing (#285) 2bb0b78 unverified winglian commited on Aug 12, 2023
experimental llama 2 chat support (#296) 3392270 unverified Jan Philipp Harries Jan Philipp Harries commited on Aug 6, 2023
update prompts for open orca to match the paper (#317) 3d4984b unverified winglian commited on Jul 22, 2023
Fixed pre-commit problems, fixed small bug in logging_config to handle LOG_LEVEL env var b1f4f7a theobjectivedad commited on Jul 15, 2023
Merge pull request #214 from OpenAccess-AI-Collective/fix-tokenizing-labels 1925eaf unverified winglian commited on Jun 15, 2023
Update doc for grad_accu and add validation tests for batch size 3c71c8d Nanobit commited on May 31, 2023
fix packing so that concatenated sequences reset the attention 9b8585d winglian commited on May 31, 2023
new hf_use_auth_token setting so login to hf isn't required 1c33eb8 winglian commited on May 28, 2023