Updates

Add arg โ€™use_safetensors=Falseโ€˜ in from_quanted(), while this arg is set to False as defauly in previous Auto-GPTQ. If there are any problems to load model directly by HF, someone can try git clone. (Dec 15๏ผŒ 2023)

Description

This repo contains int4 model(GPTQ) for AceGPT-7B-Chat.

The performance of the int4 version has experienced some degradation. For a better user experience, please use the fp16 version. For details, see AceGPT-7B-Chat and AceGPT-13B-Chat.

How to use this GPTQ model from Python code

Install the necessary packages

Requires: Transformers 4.32.0 or later, Optimum 1.12.0 or later, and AutoGPTQ 0.4.2 or later.

pip3 install transformers>=4.32.0 optimum>=1.12.0   #See requirements.py for verified versions.
pip3 install auto-gptq --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/  # Use cu117 if on CUDA 11.7

You can then generate a simple gradioweb with_quant.py

python web_quant.py --model-name ${model-path}

You can get more details at https://github.com/FreedomIntelligence/AceGPT/tree/main

Downloads last month
64
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.