origin: attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)fix: attn_output = attn_output.reshape(bsz, q_len, 4096)in this place the hidden_size is not equal to head_dim * head_numswe need to change the value to get through
· Sign up or log in to comment