PowerInfer
/

Bamboo-base-v0_1

Feature Extraction

Model card Files Files and versions Community

hodlen commited on Mar 25

Commit

0289556

•

1 Parent(s): d19c610

Update ref link in README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -32,7 +32,7 @@ In this section, we introduce the details of training our model, including types
 We initialized the model weights to Mistral's model weights and modified the FFN structure to the ReGLU+ReLU structure, then continued pre-training for 200B tokens, divided into two phases:
-**First phase**: For the proportion of training corpus, we followed the data mix ratio and sources of the StableLM-3B model, conducting a further pre-training with 150B tokens.(link)
 The following table shows the hyper-paramters we used in our training process.

 We initialized the model weights to Mistral's model weights and modified the FFN structure to the ReGLU+ReLU structure, then continued pre-training for 200B tokens, divided into two phases:
+**First phase**: For the proportion of training corpus, we followed the data mix ratio and sources of the StableLM-3B model ([link](https://stability.wandb.io/stability-llm/stable-lm/reports/StableLM-3B-4E1T--VmlldzoyMjU4?accessToken=u3zujipenkx5g7rtcj9qojjgxpconyjktjkli2po09nffrffdhhchq045vp0wyfo)), conducting a further pre-training with 150B tokens.
 The following table shows the hyper-paramters we used in our training process.