ONNX Conversion script

#10

by ha1772007 - opened Oct 7, 2024

Can you provide the script by which this model is converted to q4

•

The ONNX files were contributed without a conversion script by HuggingFace staff member @Xenova here, so you may want to ping @Xenova directly.

I believe he uses quantize.py, I think in particular these lines are in charge of the q4 quantization: https://github.com/xenova/transformers.js/blob/v3/scripts/quantize.py#L188-L208

P.s. are you getting good results with that quantization?

Yes Quantization is increasing good speed especially on CPU

comparison between float32 and float16 -> 99% similarity
comparison between float32 and int8 -> 97% similarity

I calculated Similarity on over 80+ 2000 characters long text pieces by cosine similarity

ha1772007 changed discussion status to closed Oct 8, 2024

ha1772007 changed discussion status to open Oct 8, 2024

spacemanidol changed discussion status to closed Nov 22, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment