Update ref link in README.md
Browse files
README.md
CHANGED
@@ -32,7 +32,7 @@ In this section, we introduce the details of training our model, including types
|
|
32 |
|
33 |
We initialized the model weights to Mistral's model weights and modified the FFN structure to the ReGLU+ReLU structure, then continued pre-training for 200B tokens, divided into two phases:
|
34 |
|
35 |
-
**First phase**: For the proportion of training corpus, we followed the data mix ratio and sources of the StableLM-3B model, conducting a further pre-training with 150B tokens.
|
36 |
|
37 |
The following table shows the hyper-paramters we used in our training process.
|
38 |
|
|
|
32 |
|
33 |
We initialized the model weights to Mistral's model weights and modified the FFN structure to the ReGLU+ReLU structure, then continued pre-training for 200B tokens, divided into two phases:
|
34 |
|
35 |
+
**First phase**: For the proportion of training corpus, we followed the data mix ratio and sources of the StableLM-3B model ([link](https://stability.wandb.io/stability-llm/stable-lm/reports/StableLM-3B-4E1T--VmlldzoyMjU4?accessToken=u3zujipenkx5g7rtcj9qojjgxpconyjktjkli2po09nffrffdhhchq045vp0wyfo)), conducting a further pre-training with 150B tokens.
|
36 |
|
37 |
The following table shows the hyper-paramters we used in our training process.
|
38 |
|