Discrepancy in model sizes
Hello team! The size of the unquantized onnx model is 133mb, whereas the pytorch model is only 66.8mb. This is generally uncommon. For example, all-MiniLM-L6-v2's unquantized size is 90mb, roughly the same as the pytorch model.
While this isn't a problem itself, I wanted to raise this issue for further investigation.
Edit: I found Xenova has also uploaded his own version of this model, here, and it has the same issue.
@varun4
I was confused by this at first too. The pytorch model for gte-small
is 16 bit as opposed to many other models that are 32 bit. The non-quantized ONNX models are always 32 bit, and quantized are 8 bit. This is why the non-quantized ONNX model is double the size of the pytorch model, and quantized ONNX model is half the size of the pytorch model.
That makes sense thank you!