IvanHU commited on
Commit
cb90a53
·
verified ·
1 Parent(s): ad33855

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -3
README.md CHANGED
@@ -11,14 +11,25 @@ Both [**YuLan-Mini**](https://huggingface.co/yulan-team/YuLan-Mini) and **YuLan-
11
 
12
  This version includes the optimizer, allowing you to resume training using the Hugging Face Trainer and DeepSpeed Universal Checkpoint.
13
 
14
- For easier inference and deployment, we merged the re-parameterized added parameters and scaling factors into the final released models ([**YuLan-Mini**](https://huggingface.co/yulan-team/YuLan-Mini) and **YuLan-Mini-Intermediate-4K**), enabling it to run on the Llama architecture. However, these parameters are still retained in the intermediate checkpoints from the training process.
 
 
 
 
 
 
 
 
 
 
 
15
 
16
  ## What you can do with these pre-training resources
17
 
18
  1. **Pre-train** your own LLM. You can use [our data](https://huggingface.co/yulan-team/YuLan-Mini-Datasets) and curriculum to train a model that's just as powerful as YuLan-Mini.
19
  2. Perform your own **learning rate annealing**. During the annealing phase, YuLan-Mini's learning ability is at its peak. You can resume training from [the checkpoint before annealing](https://huggingface.co/yulan-team/YuLan-Mini-Before-Annealing) and use your own dataset for learning rate annealing.
20
- 3. **Fine-tune** the Instruct version of the LLM. You can use the YuLan-Mini base model to train your own Instruct version.
21
- 4. **Training dynamics** research. You can use YuLan-Mini's intermediate checkpoints to explore internal changes during the pre-training process.
22
  5. **Synthesize** your own data. You can use YuLan-Mini's [data pipeline](https://github.com/RUC-GSAI/YuLan-Mini) to clean and generate your own dataset.
23
 
24
 
 
11
 
12
  This version includes the optimizer, allowing you to resume training using the Hugging Face Trainer and DeepSpeed Universal Checkpoint.
13
 
14
+ | Stage | Curriculum Phase | 4K Context | 28K Context | Optimizer | Inference Arch | LAMBADA `Acc` | GSM8K `Acc` | HumanEval `pass@1` |
15
+ |-----------|------------------|-----------------------------|-------------|-----------|----------------|-------------|-------------|--------------------|
16
+ | Stable | 5 | [YuLan-Mini-Phase5](https://huggingface.co/yulan-team/YuLan-Mini-Phase5) | | | `yulanmini` | 53.85 | 3.41 | 12.26 |
17
+ | Stable | 10 | [YuLan-Mini-Phase10](https://huggingface.co/yulan-team/YuLan-Mini-Phase10) | | | `yulanmini` | 55.00 | 9.57 | 15.95 |
18
+ | Stable | 15 | [YuLan-Mini-Phase15](https://huggingface.co/yulan-team/YuLan-Mini-Phase15) | | | `yulanmini` | 55.81 | 13.81 | 16.99 |
19
+ | Stable | 20 | [YuLan-Mini-Phase20](https://huggingface.co/yulan-team/YuLan-Mini-Phase20) | | ✅ | `yulanmini` | 55.81 | 21.39 | 20.79 |
20
+ | Stable | 25 (1T tokens) | [YuLan-Mini-Before-Annealing](https://huggingface.co/yulan-team/YuLan-Mini-Before-Annealing) | | ✅ | `yulanmini` | 55.67 | 29.94 | 34.06 |
21
+ | | | | | | | | |
22
+ | Annealing | 26 | YuLan-Mini-4K | | | `llama`\* | 64.72 | 66.65 | 61.60 |
23
+ | Annealing | 27 | | [YuLan-Mini](https://huggingface.co/yulan-team/YuLan-Mini) | | `llama`\* | 65.67 | 68.46 | 64.00 |
24
+
25
+ \*: For easier inference and deployment, we merged the re-parameterized added parameters and scaling factors into the final released models ([**YuLan-Mini**](https://huggingface.co/yulan-team/YuLan-Mini) and **YuLan-Mini-Intermediate-4K**), enabling it to run on the Llama architecture. However, these parameters are still retained in the intermediate checkpoints from the training process.
26
 
27
  ## What you can do with these pre-training resources
28
 
29
  1. **Pre-train** your own LLM. You can use [our data](https://huggingface.co/yulan-team/YuLan-Mini-Datasets) and curriculum to train a model that's just as powerful as YuLan-Mini.
30
  2. Perform your own **learning rate annealing**. During the annealing phase, YuLan-Mini's learning ability is at its peak. You can resume training from [the checkpoint before annealing](https://huggingface.co/yulan-team/YuLan-Mini-Before-Annealing) and use your own dataset for learning rate annealing.
31
+ 3. **Fine-tune** the Instruct version of the LLM. You can use the [YuLan-Mini](https://huggingface.co/yulan-team/YuLan-Mini) base model to train your own Instruct version.
32
+ 4. **Training dynamics** research. You can use YuLan-Mini's [intermediate checkpoints](https://huggingface.co/collections/yulan-team/yulan-mini-676d214b24376739b00d95f3) to explore internal changes during the pre-training process.
33
  5. **Synthesize** your own data. You can use YuLan-Mini's [data pipeline](https://github.com/RUC-GSAI/YuLan-Mini) to clean and generate your own dataset.
34
 
35