Question about MoA
#2
by
TechxGenus
- opened
Congratulations on this amazing work! I noticed that unlike Mixtral/DeepseekMoE/QwenMoE, multiple experts are also added for the attention layer. How would this affect the results?