|
--- |
|
datasets: |
|
- yahma/alpaca-cleaned |
|
--- |
|
# Platypus2-70B-instruct-4bit-gptq |
|
|
|
Platypus2-70B-instruct-4bit-gptq is a qunatnized version of [`garage-bAInd/Platypus2-70B-instruct`](https://huggingface.co/garage-bAInd/Platypus2-70B-instruct) using GPTQ Quantnization. |
|
This model is only 35 GB in size in comparision with the original garage-bAInd/Platypus2-70B-instruct 127 GB and can run on a single A6000 GPU |
|
|
|
|
|
### Model Details |
|
|
|
* **Quantnized by**: [`Mohamad Alhajar`](https://www.linkedin.com/in/muhammet-alhajar/) |
|
* **Model type:** quantnized version of Platypus2-70B-instruct using 4bit quantnization |
|
* **Language(s)**: English |
|
|
|
### Prompt Template |
|
``` |
|
### Instruction: |
|
|
|
<prompt> (without the <>) |
|
|
|
### Response: |
|
``` |
|
|
|
### Training Dataset |
|
|
|
`Platypus2-70B-instruct-4bit-gptq` quantnized using gptq on Alpaca dataset [`yahma/alpaca-cleaned`](https://huggingface.co/datasets/yahma/alpaca-cleaned). |
|
|
|
### Training Procedure |
|
|
|
`garage-bAInd/Platypus2-70B` was fine-tuned using gptq on 2 L40 48GB. |
|
|
|
## How to Get Started with the Model |
|
First install auto_gptq with |
|
```shell |
|
pip install auto_gptq |
|
``` |
|
|
|
Use the code sample provided in the original post to interact with the model. |
|
```python |
|
from transformers import AutoTokenizer |
|
from auto_gptq import AutoGPTQForCausalLM |
|
|
|
model_id = "malhajar/Platypus2-70B-instruct-4bit-gptq" |
|
model = AutoGPTQForCausalLM.from_quantized(model_id,inject_fused_attention=False, |
|
use_safetensors=True, |
|
trust_remote_code=False, |
|
use_triton=False, |
|
quantize_config=None) |
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model_id) |
|
|
|
question: "Who was the first person to walk on the moon?" |
|
# For generating a response |
|
prompt = ''' |
|
### Instruction: |
|
{question} |
|
|
|
### Response:''' |
|
input_ids = tokenizer(prompt, return_tensors="pt").input_ids |
|
output = model.generate(input_ids) |
|
response = tokenizer.decode(output[0]) |
|
|
|
print(response) |
|
``` |
|
|
|
### Citations |
|
```bibtex |
|
@article{platypus2023, |
|
title={Platypus: Quick, Cheap, and Powerful Refinement of LLMs}, |
|
author={Ariel N. Lee and Cole J. Hunter and Nataniel Ruiz}, |
|
booktitle={arXiv preprint arxiv:2308.07317}, |
|
year={2023} |
|
} |
|
``` |
|
```bibtex |
|
@misc{touvron2023llama, |
|
title={Llama 2: Open Foundation and Fine-Tuned Chat Models}, |
|
author={Hugo Touvron and Louis Martin and Kevin Stone and Peter Albert and Amjad Almahairi and Yasmine Babaei and Nikolay Bashlykov year={2023}, |
|
eprint={2307.09288}, |
|
archivePrefix={arXiv}, |
|
} |
|
``` |
|
```bibtex |
|
@misc{frantar2023gptq, |
|
title={GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers}, |
|
author={Elias Frantar and Saleh Ashkboos and Torsten Hoefler and Dan Alistarh}, |
|
year={2023}, |
|
eprint={2210.17323}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.LG} |
|
} |
|
``` |
|
|