Difference between SparseLLM/relu and SparseLLM/reglu - lack of modeling file?

#1
by xunkai55 - opened

Hi there,

I'm trying to understand the difference between SparseLLM/relu and SparseLLM/reglu, but their config files look very similar. Only intermidiate_size is different. hidden_act is set to relu for both models.

Besides, relu-5b seems not working properly. I guess you changed the modeling_llama.py file to make it really a ReLU (ReLU(W_in * X)) rather than ReGLU. Am I understanding correctly? If so, it would be better if you also open-source that modeling file. The difference is probably better clarified in the paper.

And thanks to the great work in relu^2-wins paper!

SparseLLMs org

For the relu2/relu model, we do not have both up/gate projection. We just have a gate projection and a down projection.
For reglu model, we follow the typical gate, up, down projection.

Sign up or log in to comment