Introducing AutoRound INT4 algorithm
Hello,
First and foremost, I would like to express my gratitude for your exceptional work and for sharing your model with the community. We have recently applied AutoRound to your model, achieving better results fore int4 model. Below are the accuracies, all tested with real quantized models in the same environment and zero shot tasks.
Metric | BF16 | 01-ai/Yi-6B-Chat-4bits | INT4 |
---|---|---|---|
Avg. | 0.6043 | 0.5867 | 0.5939 |
mmlu | 0.6163 | 0.6133 | 0.6119 |
cmmlu | 0.7431 | 0.7312 | 0.7314 |
ceval | 0.7355 | 0.7155 | 0.7281 |
gsm8k | 0.3222 | 0.2866 | 0.3040 |
Unfortunately, we are unable to upload the quantized model due to licensing constraints. Therefore, we would appreciate it if you could generate it yourself by following the recipe links, and we are here to provide assistance. Additionally, we would greatly appreciate it if you could consider using our method to generate quantized models for your new models in the future.
Hi,
As our company has very strict legal review and this process typically takes a long time, so we could not uploaded it, at least for now. So we would appreciate that you could try and generate it yourself.
Besides, we have supported local data and combination of different datasets for calibration, e.g, --dataset "./tmp.json,NeelNanda/pile-10k" . Using you own train data for calibration may lead to better accuracy. A soft reminder, we drop the samples < args.seqlen for now
Got it. I will use your method to generate a quantized model and upload it!
Thank you for your kind understanding