SparseLLM/relu-5B · Difference between SparseLLM/relu and SparseLLM/reglu

Hi there,

I'm trying to understand the difference between SparseLLM/relu and SparseLLM/reglu, but their config files look very similar. Only intermidiate_size is different. hidden_act is set to relu for both models.

Besides, relu-5b seems not working properly. I guess you changed the modeling_llama.py file to make it really a ReLU (ReLU(W_in * X)) rather than ReGLU. Am I understanding correctly? If so, it would be better if you also open-source that modeling file. The difference is probably better clarified in the paper.

And thanks to the great work in relu^2-wins paper!

SparseLLM
/

relu-5B

Difference between SparseLLM/relu and SparseLLM/reglu - lack of modeling file?