Incomplete embeddings

#7
by szm913 - opened

The tokenizer has vocab_size == 2454 while the embeddings have the following dimensions:

import onnx
model = onnx.load("converted/onnx/embed_tokens.onnx")
graph = model.graph
for init in graph.initializer:
    print(init.name, init.dims)
/Constant_9_output_0 []
/Constant_12_output_0 []
/Constant_15_output_0 [1]
/Constant_23_output_0 [3]
/Constant_1_output_0 []
onnx::MatMul_161 [1, 1024]
/Constant_5_output_0 [1]
/Constant_10_output_0 [1]
/Unsqueeze_14_output_0 [1]
/Mul_output_0 [2]
/ConstantOfShape_output_0 [2]
text_emb.weight [2352, 1024]
speech_emb.weight [8194, 1024]
text_pos_emb.weight [2050, 1024]
speech_pos_emb.weight [4100, 1024]

This makes the model not compatible with some of inputs.

Sign up or log in to comment