Why the intermediate_size of Qwen1.5-MoE-A2.7B is different from Qwen-1.8B?
#5
by
ShiKeNLP
- opened
Hello,
The intermediate_size of Qwen1.5-MoE-A2.7B is 5632, while Qwen-1.8's intermediate_size is 11008. May I ask what's the relationship of these two intermediate_size? and how to upcycle a mlp layer with size 11008 to a fine-grained expert with size 5632?
In the report, one MLP layer will be copied by 8 times, and one expert will split to 8 fine-grained experts, so it seems the moe_intermediate_size should be 1/8 of Qwen-1.8B's intermediate_size, but the moe_intermediate_size is 1408, Qwen-1.8B's intermediate_size is 11008, and 11008/1408=7.8182, so why it's not 8 times?
Thank you !
same question
This comment has been hidden
Hi, all! Please see our technical report on that matter: https://arxiv.org/html/2407.10671
jklj077
changed discussion status to
closed