--- base_model: Ichsan2895/Merak-7B-v4 license: llama2 datasets: - allenai/c4 language: - id tags: - gptq - mistral - indonesia inference: false --- # Merak-7B-v4 GPTQ
Merak

Utilize the [c4/id]("https://huggingface.co/datasets/allenai/c4/blob/main/multilingual/c4-id.tfrecord-00000-of-01024.json.gz") dataset for the quantization process. [Merak-7B-v4 GPTQ]("https://huggingface.co/daptheHuman/Merak-7B-v4-GPTQ") is GPTQ version of [Ichsan2895/Merak-7B-v4](https://huggingface.co/Ichsan2895/Merak-7B-v4) ## Python code example: inference from this GPTQ model ### Install the necessary packages Requires: Transformers 4.33.0 or later, Optimum 1.12.0 or later, and AutoGPTQ 0.4.2 or later. ```shell pip3 install --upgrade transformers optimum # If using PyTorch 2.1 + CUDA 12.x: pip3 install --upgrade auto-gptq # or, if using PyTorch 2.1 + CUDA 11.x: pip3 install --upgrade auto-gptq --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/ ``` If you are using PyTorch 2.0, you will need to install AutoGPTQ from source. Likewise if you have problems with the pre-built wheels, you should try building from source: ```shell pip3 uninstall -y auto-gptq git clone https://github.com/PanQiWei/AutoGPTQ cd AutoGPTQ git checkout v0.5.1 pip3 install . ``` ### Example Python code ```python from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline model_name_or_path = "daptheHuman/Merak-7B-v4-GPTQ" # To use a different branch, change revision # For example: revision="gptq-4bit-32g-actorder_True" model = AutoModelForCausalLM.from_pretrained(model_name_or_path, device_map="auto", trust_remote_code=False, revision="main") tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True) prompt = "Tell me about AI" prompt_template=f'''### Instruction: {prompt} ### Response: ''' print("\n\n*** Generate:") input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda() output = model.generate(inputs=input_ids, temperature=0.7, do_sample=True, top_p=0.95, top_k=40, max_new_tokens=512) print(tokenizer.decode(output[0])) # Inference can also be done using transformers' pipeline print("*** Pipeline:") pipe = pipeline( "text-generation", model=model, tokenizer=tokenizer, max_new_tokens=512, do_sample=True, temperature=0.7, top_p=0.95, top_k=40, repetition_penalty=1.1 ) print(pipe(prompt_template)[0]['generated_text']) ``` ## Credits [TheBloke](https://huggingface.co/TheBloke/) for README template. [asyafiqe](https://huggingface.co/asyafiqe/) for v3-GPTQ inspiration.