ONNX model output
ONNX output is different from the output of Transformers and SentenceTransformer.
I checked the onnx model using netron.app, and the text_embeds
, and 13049
corresponds to the sequence_output
and pooled_output
of the XLMRobertaModel
. However, when I compared it with the model loaded from model.safetensors, I found that the results were different.
Hi @Riddler2024 , have you tried running inference the way we demonstrate in the README? If there's a slight difference, it could be because ONNX uses fp32, while ST or HF may use bf16 when running in a GPU environment. If this doesn't resolve the issue, please share a code snippet so I can reproduce the behavior.
I am using the example code from the README. I extracted the onnx model output 13049
and compared it with the output of XLMRobertaModel.forward
in the xlm-roberta-flash-implementation repository, and all outputs were not normalized.
@Riddler2024
, ok I see the issue. ONNX model mimics the forward
function, which doesn’t apply any normalization by itself, however both HF encode
and SentenceTransformers include a normalization step. This is why the outputs differ. I’d suggest applying the normalization yourself after running inference. Would that be convenient for your application?