|
--- |
|
license: apache-2.0 |
|
--- |
|
# LLM360 Research Suite: K2 Loss Spike 1 |
|
We encountered two major loss spikes while [training K2](https://huggingface.co/LLM360/K2). |
|
* The first loss spike occurred after 160 checkpoints and lasted over ~34 checkpoints. We restarted training at checkpoint 160 and training returned to normal. |
|
* The [second loss spike](https://huggingface.co/LLM360/K2-Spike-2/) occurred after restarting training to fix the first loss spike at checkpoint 186 and lasted from ~8 checkpoints. |
|
* For every spike checkpoint, we also uploaded the corresponding normal checkpoint for easy comparison. You could find different checkpoints in different branches. |
|
|
|
We are releasing these checkpoints so others can study this interesting phenomena in large model training. |
|
|
|
<img src="loss_spike.png" alt="k2 loss spikes"/> |
|
|
|
# Purpose |
|
Loss spikes are still a relatively unknown phenomena. By making these spikes and associated training details available, we hope others use these artifacts to further the worlds knowledge on this topic. |
|
|
|
## First 10 Checkpoints |
|
| Checkpoints | | |
|
| ----------- | ----------- | |
|
| [Checkpoint 160](https://huggingface.co/LLM360/K2-Spike-1/tree/spike_ckpt_160) | [Checkpoint 170](https://huggingface.co/LLM360/K2-Spike-1/tree/spike_ckpt_170) | |
|
| [Checkpoint 162](https://huggingface.co/LLM360/K2-Spike-1/tree/spike_ckpt_162) | [Checkpoint 172](https://huggingface.co/LLM360/K2-Spike-1/tree/spike_ckpt_172) | |
|
| [Checkpoint 164](https://huggingface.co/LLM360/K2-Spike-1/tree/spike_ckpt_164) | [Checkpoint 174](https://huggingface.co/LLM360/K2-Spike-1/tree/spike_ckpt_174) | |
|
| [Checkpoint 166](https://huggingface.co/LLM360/K2-Spike-1/tree/spike_ckpt_166) | [Checkpoint 176](https://huggingface.co/LLM360/K2-Spike-1/tree/spike_ckpt_176) | |
|
| [Checkpoint 168](https://huggingface.co/LLM360/K2-Spike-1/tree/spike_ckpt_168) | [Checkpoint 178](https://huggingface.co/LLM360/K2-Spike-1/tree/spike_ckpt_178) | |
|
|
|
[to find all branches: git branch -a] |
|
|
|
## Loss Spike's on the LLM360 Evaluation Suite |
|
|
|
View all the evaluations on our [Weights & Biases here](https://wandb.ai/llm360/K2?nw=7bxe4sz0vv) |
|
|
|
|
|
## About the LLM360 Research Suite |
|
The LLM360 Research Suite is a comprehensive set of large language model (LLM) artifacts from Amber, CrystalCoder, and K2 for academic and industry researchers to explore LLM training dynamics. Additional resources can be found at llm360.ai. |
|
|
|
## Citation |
|
|
|
**BibTeX:** |
|
|
|
```bibtex |
|
@misc{ |
|
title={LLM360-K2-65B: Scaling Up Open and Transparent Language Models}, |
|
author={The LLM360 Team}, |
|
year={2024}, |
|
} |
|
``` |
|
|