PowerInfer
/

Bamboo-base-v0_1

Feature Extraction

Model card Files Files and versions Community

yixinsong commited on Mar 25

Commit

e809a07

•

1 Parent(s): 439b36d

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -8,7 +8,7 @@ However, the widespread adoption of ReLU-based models in the LLM field remains l
 ## Model Architecture
-As the ReGLU-based LLM has limited sparsity, for example, [ReluLLaMA-7B](https://huggingface.co/SparseLLM/ReluLLaMA-7B) has just nearly 67% sparsity. To further push the model's sparsity, we add a relu component after GLU. So our FFN network works as follows:
 ```Python
 class BambooMLP(nn.Module):
@@ -30,7 +30,7 @@ class BambooMLP(nn.Module):
 In this section, we introduce the details of training our model, including types of data used, and hyperparameters.
-We initialized the model weights to Mistral's model weights and modified the FFN structure to the ReGLU+ReLU structure, then continued pre-training for 200B tokens, divided into two phases:
 **First phase**: For the proportion of training corpus, we followed the data mix ratio and sources of the StableLM-3B model ([link](https://stability.wandb.io/stability-llm/stable-lm/reports/StableLM-3B-4E1T--VmlldzoyMjU4?accessToken=u3zujipenkx5g7rtcj9qojjgxpconyjktjkli2po09nffrffdhhchq045vp0wyfo)), conducting a further pre-training with 150B tokens.

 ## Model Architecture
+To push the model's sparsity, we add a ReLU component after GLU component, called dReLU(double ReLU) So our FFN network works as follows:
 ```Python
 class BambooMLP(nn.Module):
 In this section, we introduce the details of training our model, including types of data used, and hyperparameters.
+We initialized the model weights to Mistral's model weights and modified the FFN structure to the dReLU structure, then continued pre-training for 200B tokens, divided into two phases:
 **First phase**: For the proportion of training corpus, we followed the data mix ratio and sources of the StableLM-3B model ([link](https://stability.wandb.io/stability-llm/stable-lm/reports/StableLM-3B-4E1T--VmlldzoyMjU4?accessToken=u3zujipenkx5g7rtcj9qojjgxpconyjktjkli2po09nffrffdhhchq045vp0wyfo)), conducting a further pre-training with 150B tokens.