βοΈ modification of the cross-entropy loss function designed specifically for training LLMs. βοΈ twist on the standard cross-entropy loss by emphasizing the importance of outlier prediction errors and dynamically normalizing token-level variance. βοΈ more stable and efficient training, leading to models that generalize better.
Check it out, give it a spin, and let me know what you think!
Licensed under the Apache 2.0 license and ready to use. Happy training! π₯π€