Incomplete embeddings

by szm913 - opened 6 days ago

6 days ago

The tokenizer has vocab_size == 2454 while the embeddings have the following dimensions:

import onnx
model = onnx.load("converted/onnx/embed_tokens.onnx")
graph = model.graph
for init in graph.initializer:
    print(init.name, init.dims)

/Constant_9_output_0 []
/Constant_12_output_0 []
/Constant_15_output_0 [1]
/Constant_23_output_0 [3]
/Constant_1_output_0 []
onnx::MatMul_161 [1, 1024]
/Constant_5_output_0 [1]
/Constant_10_output_0 [1]
/Unsqueeze_14_output_0 [1]
/Mul_output_0 [2]
/ConstantOfShape_output_0 [2]
text_emb.weight [2352, 1024]
speech_emb.weight [8194, 1024]
text_pos_emb.weight [2050, 1024]
speech_pos_emb.weight [4100, 1024]

This makes the model not compatible with some of inputs.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment