PowerInfer
/

Bamboo-base-v0_1

@@ -1,14 +1,14 @@
 ## Introducation
-Sparse computing is increasingly recognized as an important direction to improve the computational efficiency of large language models (LLM). Among various approaches, a mixture of experts (MoE) methods (exemplified by models such as [Mixtral]([mistralai/Mixtral-8x7B-v0.1 · Hugging Face](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1))) show particular promise. MoE works by selectively activating different model components (experts), thereby optimizing resource usage.
 Recent studies ([Zhang el al., 2021](https://arxiv.org/abs/2110.01786); [Liu et al., 2023](https://openreview.net/pdf?id=wIPIhHd00i); [Mirzadeh et al., 2023](https://arxiv.org/abs/2310.04564)) reveal that LLMs inherently exhibit properties conducive to sparse computation when employing the ReLU activation function. This insight opens up new avenues for model efficiency, akin to MoE's selective activation. By dynamically choosing model parameters for computation, we can substantially boost efficiency.
-However, the widespread adoption of ReLU-based models in the LLM field remains limited. Here we introduce a new 7B ReLU-based LLM, Bamboo, which boasts nearly 85% sparsity and performance levels on par with [Mistral]([mistralai/Mistral-7B-v0.1 · Hugging Face](https://huggingface.co/mistralai/Mistral-7B-v0.1)).
 ## Model Architecture
-As the ReGLU-based LLM has limited sparsity, for example, [ReLULLaMA]([SparseLLM/ReluLLaMA-7B · Hugging Face](https://huggingface.co/SparseLLM/ReluLLaMA-7B)) has just nearly 67% sparsity. To further push the model's sparsity, we add a relu component after GLU. So our FFN network works as follows:
 ```Python
 class BambooMLP(nn.Module):

 ## Introducation
+Sparse computing is increasingly recognized as an important direction to improve the computational efficiency of large language models (LLM). Among various approaches, a mixture of experts (MoE) methods (exemplified by models such as [Mixtral](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1)) show particular promise. MoE works by selectively activating different model components (experts), thereby optimizing resource usage.
 Recent studies ([Zhang el al., 2021](https://arxiv.org/abs/2110.01786); [Liu et al., 2023](https://openreview.net/pdf?id=wIPIhHd00i); [Mirzadeh et al., 2023](https://arxiv.org/abs/2310.04564)) reveal that LLMs inherently exhibit properties conducive to sparse computation when employing the ReLU activation function. This insight opens up new avenues for model efficiency, akin to MoE's selective activation. By dynamically choosing model parameters for computation, we can substantially boost efficiency.
+However, the widespread adoption of ReLU-based models in the LLM field remains limited. Here we introduce a new 7B ReLU-based LLM, Bamboo, which boasts nearly 85% sparsity and performance levels on par with [Mistral](https://huggingface.co/mistralai/Mistral-7B-v0.1).
 ## Model Architecture
+As the ReGLU-based LLM has limited sparsity, for example, [ReluLLaMA-7B](https://huggingface.co/SparseLLM/ReluLLaMA-7B) has just nearly 67% sparsity. To further push the model's sparsity, we add a relu component after GLU. So our FFN network works as follows:
 ```Python
 class BambooMLP(nn.Module):