Question: MHA to MQA Conversion
#27
by
kirazT
- opened
First of all, thanks so much for introducing this project! The learnings are really interesting and I enjoy quite a lot reading the preprints and playing with the code.
A quick question on the MQA architecture: I understand that it is a great impl. for inference. However, is there a way to convert existing MHA checkpoints into MQA-based ones? Given that the cost of retraining models is quite high, it would be nice if there's a way to convert MHA model weights to MQA ones.
Thanks for helping! And looking forward to your thoughts on this.