|
--- |
|
quantized_by: nisten |
|
pipeline_tag: text-generation |
|
language: |
|
- en |
|
license_link: https://huggingface.co/huihui-ai/Qwen2.5-Coder-7B-Instruct-abliterated/blob/main/LICENSE |
|
tags: |
|
- chat |
|
- abliterated |
|
- uncensored |
|
- AWQ |
|
- 4bit |
|
base_model: huihui-ai/Qwen2.5-Coder-7B-Instruct-abliterated |
|
license: apache-2.0 |
|
--- |
|
|
|
## Use this as a draft model, quant code provided, love you all. |
|
|
|
4bit AWQ quant of model: https://huggingface.co/huihui-ai/Qwen2.5-Coder-7B-Instruct-abliterated |
|
|
|
Code used to quantize it |
|
```python |
|
from tqdm import tqdm |
|
from datasets import load_dataset |
|
from awq import AutoAWQForCausalLM |
|
from transformers import AutoTokenizer |
|
|
|
model_path = 'huihui-ai/Qwen2.5-Coder-7B-Instruct-abliterated' |
|
quant_path = 'q7awqlocaldirname' |
|
quant_config = { "zero_point": True, "q_group_size": 128, "w_bit": 4, "version": "GEMM" } |
|
|
|
# Load model |
|
model = AutoAWQForCausalLM.from_pretrained(model_path) |
|
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True) |
|
|
|
def load_openhermes_coding(): |
|
data = load_dataset("alvarobartt/openhermes-preferences-coding", split="train") |
|
samples = [] |
|
for sample in data: |
|
responses = [f'{response["role"]}: {response["content"]}' for response in sample["chosen"]] |
|
samples.append("\n".join(responses)) |
|
|
|
return samples |
|
|
|
# Quantize |
|
model.quantize( |
|
tokenizer, |
|
quant_config=quant_config, |
|
calib_data=load_openhermes_coding(), |
|
# MODIFY these parameters if need be: |
|
# n_parallel_calib_samples=32, |
|
# max_calib_samples=128, |
|
# max_calib_seq_len=4096 |
|
) |
|
|
|
# Save quantized model |
|
model.save_quantized(quant_path) |
|
tokenizer.save_pretrained(quant_path) |
|
|
|
print(f'Model is quantized and saved at "{quant_path}"') |
|
``` |