ChatGLM-6B + ONNX

This model is exported from ChatGLM-6b with int8 quantization and optimized for ONNXRuntime inference. Export code in this repo.

Inference code with ONNXRuntime is uploaded with the model. Install requirements and run streamlit run web-ui.py to start chatting. Currently the MatMulInteger (for u8s8 data type) and DynamicQuantizeLinear operators are only supported on CPU. Arm64 with Neon support (Apple M1/M2) should be reasonably fast.

ๅฎ‰่ฃ…ไพ่ต–ๅนถ่ฟ่กŒ streamlit run web-ui.py ้ข„่งˆๆจกๅž‹ๆ•ˆๆžœใ€‚็”ฑไบŽ ONNXRuntime ็ฎ—ๅญๆ”ฏๆŒ้—ฎ้ข˜๏ผŒ็›ฎๅ‰ไป…่ƒฝๅคŸไฝฟ็”จ CPU ่ฟ›่กŒๆŽจ็†๏ผŒๅœจ Arm64 (Apple M1/M2) ไธŠๆœ‰ๅฏ่ง‚็š„้€Ÿๅบฆใ€‚ๅ…ทไฝ“็š„ ONNX ๅฏผๅ‡บไปฃ็ ๅœจ่ฟ™ไธชไป“ๅบ“ไธญใ€‚

Usage

Clone with git-lfs:

git lfs clone https://huggingface.co/K024/ChatGLM-6b-onnx-u8s8
cd ChatGLM-6b-onnx-u8s8
pip install -r requirements.txt
streamlit run web-ui.py

Or use huggingface_hub python client lib to download the repo snapshot:

from huggingface_hub import snapshot_download
snapshot_download(repo_id="K024/ChatGLM-6b-onnx-u8s8", local_dir="./ChatGLM-6b-onnx-u8s8")

Codes are released under MIT license.

Model weights are released under the same license as ChatGLM-6b, see MODEL LICENSE.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Spaces using K024/ChatGLM-6b-onnx-u8s8 100