About the unusual attention_mask of ChatGLM
Hi,
Thanks for your wonderful work.
I found the attention mask of ChatGLM uses 1
to indicate the indices to be masked and 0
to indicate the indices not to be masked, which differs from Huggingface's implementation (see [1]), which use 1
for tokens that are not masked. Although it depends on different implementations, the attention mask of ChatGLM may cause unexpected problems. For example, it is incompatible with the Prompt-Tuning and P-Tuning methods provided by the Huggingface's PEFT library (see [2]). I wonder is there a plan to fix this?
Looking forward to your reply.
Sincerely.
[1] https://github.com/huggingface/transformers/blob/5a71977b8b95d39834f07a1f739305e354bc05d0/src/transformers/models/bert/modeling_bert.py#L828
[2] https://github.com/huggingface/peft/blob/cc82b674b5db38b9a393463d38afe66e8f48ac1c/src/peft/peft_model.py#L728
I also noticed the unusual attention_mask
for THUDM/chatglm-6b
, here is my findings:
kwargs = {
'max_length': 5,
'padding': True,
'truncation': True,
'add_special_tokens': False,
}
text = 'ζ±'
tokenizer(text, **kwargs)
- ChatGLM-6B
{'input_ids': [5, 64876], 'attention_mask': array([[[False, False],
[False, False]]]), 'position_ids': array([[0, 1],
[0, 0]])}
- ChatGLM2-6B
{'input_ids': [30910, 55313], 'attention_mask': [1, 1], 'position_ids': [0, 1]}
- bert-base-chinese
{'input_ids': [3727], 'token_type_ids': [0], 'attention_mask': [1]}
False
is NOT masked here, and int(False)
is 0
, that might be where 0
comes from.
Another thing is the shape of theattention_mask
is unusual as well.
(1, 2, 2)
which should be (2,)
The code which generated those attention_mask
is here:
attention_mask = np.ones((1, seq_length, seq_length))
attention_mask = np.tril(attention_mask)
attention_mask[:, :, :context_length] = 1
attention_mask = np.bool_(attention_mask < 0.5)
To convert the attention_mask
to the normal one, I used the following code:
attention_mask = np.where([m[0][-1] for m in attention_mask], 0, 1)