daptheHuman
/

Merak-7B-v4-GPTQ

Text Generation

text-generation-inference

Model card Files Files and versions Community

Merak-7B-v4-GPTQ / README.md

daptheHuman's picture

Update README.md

09eac3b 10 months ago

|

history blame contribute delete

No virus

3 kB

	---
	base_model: Ichsan2895/Merak-7B-v4
	license: llama2
	datasets:
	- allenai/c4
	language:
	- id
	tags:
	- gptq
	- mistral
	- indonesia
	inference: false
	---

	# Merak-7B-v4 GPTQ
	<!-- markdownlint-disable MD041 -->

	<!-- header start -->
	<!-- 200823 -->
	<div style="margin-left: auto; margin-right: auto">
	<img src="https://i.imgur.com/aMm54ZY.jpg" alt="Merak" style="width: 300px; margin:auto">
	</div>
	<hr style="margin-top: 1.0em; margin-bottom: 1.0em;">
	<!-- header end -->

	Utilize the [c4/id]("https://huggingface.co/datasets/allenai/c4/blob/main/multilingual/c4-id.tfrecord-00000-of-01024.json.gz") dataset for the quantization process.

	[Merak-7B-v4 GPTQ]("https://huggingface.co/daptheHuman/Merak-7B-v4-GPTQ") is GPTQ version of [Ichsan2895/Merak-7B-v4](https://huggingface.co/Ichsan2895/Merak-7B-v4)

	## Python code example: inference from this GPTQ model

	### Install the necessary packages

	Requires: Transformers 4.33.0 or later, Optimum 1.12.0 or later, and AutoGPTQ 0.4.2 or later.

	```shell
	pip3 install --upgrade transformers optimum
	# If using PyTorch 2.1 + CUDA 12.x:
	pip3 install --upgrade auto-gptq
	# or, if using PyTorch 2.1 + CUDA 11.x:
	pip3 install --upgrade auto-gptq --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/
	```

	If you are using PyTorch 2.0, you will need to install AutoGPTQ from source. Likewise if you have problems with the pre-built wheels, you should try building from source:

	```shell
	pip3 uninstall -y auto-gptq
	git clone https://github.com/PanQiWei/AutoGPTQ
	cd AutoGPTQ
	git checkout v0.5.1
	pip3 install .
	```

	### Example Python code

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
	model_name_or_path = "daptheHuman/Merak-7B-v4-GPTQ"
	# To use a different branch, change revision
	# For example: revision="gptq-4bit-32g-actorder_True"
	model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
	device_map="auto",
	trust_remote_code=False,
	revision="main")
	tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
	prompt = "Tell me about AI"
	prompt_template=f'''### Instruction:
	{prompt}
	### Response:
	'''
	print("\n\n*** Generate:")
	input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
	output = model.generate(inputs=input_ids, temperature=0.7, do_sample=True, top_p=0.95, top_k=40, max_new_tokens=512)
	print(tokenizer.decode(output[0]))
	# Inference can also be done using transformers' pipeline
	print("*** Pipeline:")
	pipe = pipeline(
	"text-generation",
	model=model,
	tokenizer=tokenizer,
	max_new_tokens=512,
	do_sample=True,
	temperature=0.7,
	top_p=0.95,
	top_k=40,
	repetition_penalty=1.1
	)
	print(pipe(prompt_template)[0]['generated_text'])
	```


	## Credits
	[TheBloke](https://huggingface.co/TheBloke/) for README template.
	[asyafiqe](https://huggingface.co/asyafiqe/) for v3-GPTQ inspiration.