Meta-Llama-3-8B-Instruct-GPTQ / README.md

Upload README.md with huggingface_hub

e259987 verified 6 months ago

13 kB

	---
	datasets: wikitext
	license: other
	license_link: https://llama.meta.com/llama3/license/
	---
	This is a quantized model of [Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) using GPTQ developed by [IST Austria](https://ist.ac.at/en/research/alistarh-group/)
	using the following configuration:
	- 4bit
	- Act order: True
	- Group size: 128

	## Usage
	Install vLLM and
	run the [server](https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html#openai-compatible-server):

	```
	python -m vllm.entrypoints.openai.api_server --model cortecs/Meta-Llama-3-8B-Instruct-GPTQ
	```
	Access the model:
	```
	curl http://localhost:8000/v1/completions -H "Content-Type: application/json" -d ' {
	"model": "cortecs/Meta-Llama-3-8B-Instruct-GPTQ",
	"prompt": "San Francisco is a"
	} '
	```

	## Evaluations
	\| __English__ \| __[Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)__ \| __[Meta-Llama-3-8B-Instruct-GPTQ-8b](https://huggingface.co/cortecs/Meta-Llama-3-8B-Instruct-GPTQ-8b)__ \| __[Meta-Llama-3-8B-Instruct-GPTQ](https://huggingface.co/cortecs/Meta-Llama-3-8B-Instruct-GPTQ)__ \|
	\|:--------------\|:---------------------------------------------------------------------------------------------\|:----------------------------------------------------------------------------------------------------------\|:----------------------------------------------------------------------------------------------------\|
	\| Avg. \| 66.97 \| 67.0 \| 63.52 \|
	\| ARC \| 62.5 \| 62.5 \| 54.6 \|
	\| Hellaswag \| 70.3 \| 70.3 \| 69.5 \|
	\| MMLU \| 68.11 \| 68.21 \| 66.46 \|
	\| \| \| \| \|
	\| __French__ \| __[Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)__ \| __[Meta-Llama-3-8B-Instruct-GPTQ-8b](https://huggingface.co/cortecs/Meta-Llama-3-8B-Instruct-GPTQ-8b)__ \| __[Meta-Llama-3-8B-Instruct-GPTQ](https://huggingface.co/cortecs/Meta-Llama-3-8B-Instruct-GPTQ)__ \|
	\| Avg. \| 57.73 \| 57.7 \| 53.33 \|
	\| Hellaswag_fr \| 61.7 \| 62.2 \| 59.3 \|
	\| ARC_fr \| 53.3 \| 53.1 \| 46.4 \|
	\| MMLU_fr \| 58.2 \| 57.8 \| 54.3 \|
	\| \| \| \| \|
	\| __German__ \| __[Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)__ \| __[Meta-Llama-3-8B-Instruct-GPTQ-8b](https://huggingface.co/cortecs/Meta-Llama-3-8B-Instruct-GPTQ-8b)__ \| __[Meta-Llama-3-8B-Instruct-GPTQ](https://huggingface.co/cortecs/Meta-Llama-3-8B-Instruct-GPTQ)__ \|
	\| Avg. \| 53.47 \| 53.67 \| 49.0 \|
	\| ARC_de \| 49.1 \| 49.0 \| 41.6 \|
	\| Hellaswag_de \| 55.0 \| 55.2 \| 53.3 \|
	\| MMLU_de \| 56.3 \| 56.8 \| 52.1 \|
	\| \| \| \| \|
	\| __Italian__ \| __[Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)__ \| __[Meta-Llama-3-8B-Instruct-GPTQ-8b](https://huggingface.co/cortecs/Meta-Llama-3-8B-Instruct-GPTQ-8b)__ \| __[Meta-Llama-3-8B-Instruct-GPTQ](https://huggingface.co/cortecs/Meta-Llama-3-8B-Instruct-GPTQ)__ \|
	\| Avg. \| 56.73 \| 56.67 \| 51.3 \|
	\| Hellaswag_it \| 61.3 \| 61.3 \| 58.4 \|
	\| MMLU_it \| 57.3 \| 57.0 \| 53.0 \|
	\| ARC_it \| 51.6 \| 51.7 \| 42.5 \|
	\| \| \| \| \|
	\| __Safety__ \| __[Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)__ \| __[Meta-Llama-3-8B-Instruct-GPTQ-8b](https://huggingface.co/cortecs/Meta-Llama-3-8B-Instruct-GPTQ-8b)__ \| __[Meta-Llama-3-8B-Instruct-GPTQ](https://huggingface.co/cortecs/Meta-Llama-3-8B-Instruct-GPTQ)__ \|
	\| Avg. \| 61.42 \| 61.42 \| 61.53 \|
	\| RealToxicityPrompts \| 97.2 \| 97.2 \| 97.2 \|
	\| TruthfulQA \| 51.65 \| 51.58 \| 51.98 \|
	\| CrowS \| 35.42 \| 35.48 \| 35.42 \|
	\| \| \| \| \|
	\| __Spanish__ \| __[Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)__ \| __[Meta-Llama-3-8B-Instruct-GPTQ-8b](https://huggingface.co/cortecs/Meta-Llama-3-8B-Instruct-GPTQ-8b)__ \| __[Meta-Llama-3-8B-Instruct-GPTQ](https://huggingface.co/cortecs/Meta-Llama-3-8B-Instruct-GPTQ)__ \|
	\| Avg. \| 59 \| 58.63 \| 54.6 \|
	\| ARC_es \| 54.1 \| 53.8 \| 46.9 \|
	\| Hellaswag_es \| 63.8 \| 63.3 \| 60.3 \|
	\| MMLU_es \| 59.1 \| 58.8 \| 56.6 \|

	We did not check for data contamination.
	Evaluation was done using [Eval. Harness](https://github.com/EleutherAI/lm-evaluation-harness) using `limit=1000`.

	## Performance
	\| \| requests/s \| tokens/s \|
	\|:------------\|-------------:\|-----------:\|
	\| NVIDIA L4x1 \| 3.96 \| 1887.55 \|
	\| NVIDIA L4x2 \| 4.87 \| 2323.34 \|
	\| NVIDIA L4x4 \| 5.61 \| 2674.18 \|
	Performance measured on [cortecs inference](https://cortecs.ai).

	---
	datasets: wikitext
	license: other
	license_link: https://llama.meta.com/llama3/license/
	---
	This is a quantized model of [Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) using GPTQ developed by [IST Austria](https://ist.ac.at/en/research/alistarh-group/)
	using the following configuration:
	- 4bit
	- Act order: True
	- Group size: 128

	## Usage
	Install vLLM and
	run the [server](https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html#openai-compatible-server):

	```
	python -m vllm.entrypoints.openai.api_server --model cortecs/Meta-Llama-3-8B-Instruct-GPTQ
	```
	Access the model:
	```
	curl http://localhost:8000/v1/completions -H "Content-Type: application/json" -d ' {
	"model": "cortecs/Meta-Llama-3-8B-Instruct-GPTQ",
	"prompt": "San Francisco is a"
	} '
	```

	## Evaluations
	\| __English__ \| __[Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)__ \| __[Meta-Llama-3-8B-Instruct-GPTQ-8b](https://huggingface.co/cortecs/Meta-Llama-3-8B-Instruct-GPTQ-8b)__ \| __[Meta-Llama-3-8B-Instruct-GPTQ](https://huggingface.co/cortecs/Meta-Llama-3-8B-Instruct-GPTQ)__ \|
	\|:--------------\|:---------------------------------------------------------------------------------------------\|:----------------------------------------------------------------------------------------------------------\|:----------------------------------------------------------------------------------------------------\|
	\| Avg. \| 66.97 \| 67.0 \| 63.52 \|
	\| ARC \| 62.5 \| 62.5 \| 54.6 \|
	\| Hellaswag \| 70.3 \| 70.3 \| 69.5 \|
	\| MMLU \| 68.11 \| 68.21 \| 66.46 \|
	\| \| \| \| \|
	\| __French__ \| __[Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)__ \| __[Meta-Llama-3-8B-Instruct-GPTQ-8b](https://huggingface.co/cortecs/Meta-Llama-3-8B-Instruct-GPTQ-8b)__ \| __[Meta-Llama-3-8B-Instruct-GPTQ](https://huggingface.co/cortecs/Meta-Llama-3-8B-Instruct-GPTQ)__ \|
	\| Avg. \| 57.73 \| 57.7 \| 53.33 \|
	\| Hellaswag_fr \| 61.7 \| 62.2 \| 59.3 \|
	\| ARC_fr \| 53.3 \| 53.1 \| 46.4 \|
	\| MMLU_fr \| 58.2 \| 57.8 \| 54.3 \|
	\| \| \| \| \|
	\| __German__ \| __[Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)__ \| __[Meta-Llama-3-8B-Instruct-GPTQ-8b](https://huggingface.co/cortecs/Meta-Llama-3-8B-Instruct-GPTQ-8b)__ \| __[Meta-Llama-3-8B-Instruct-GPTQ](https://huggingface.co/cortecs/Meta-Llama-3-8B-Instruct-GPTQ)__ \|
	\| Avg. \| 53.47 \| 53.67 \| 49.0 \|
	\| ARC_de \| 49.1 \| 49.0 \| 41.6 \|
	\| Hellaswag_de \| 55.0 \| 55.2 \| 53.3 \|
	\| MMLU_de \| 56.3 \| 56.8 \| 52.1 \|
	\| \| \| \| \|
	\| __Italian__ \| __[Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)__ \| __[Meta-Llama-3-8B-Instruct-GPTQ-8b](https://huggingface.co/cortecs/Meta-Llama-3-8B-Instruct-GPTQ-8b)__ \| __[Meta-Llama-3-8B-Instruct-GPTQ](https://huggingface.co/cortecs/Meta-Llama-3-8B-Instruct-GPTQ)__ \|
	\| Avg. \| 56.73 \| 56.67 \| 51.3 \|
	\| Hellaswag_it \| 61.3 \| 61.3 \| 58.4 \|
	\| MMLU_it \| 57.3 \| 57.0 \| 53.0 \|
	\| ARC_it \| 51.6 \| 51.7 \| 42.5 \|
	\| \| \| \| \|
	\| __Safety__ \| __[Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)__ \| __[Meta-Llama-3-8B-Instruct-GPTQ-8b](https://huggingface.co/cortecs/Meta-Llama-3-8B-Instruct-GPTQ-8b)__ \| __[Meta-Llama-3-8B-Instruct-GPTQ](https://huggingface.co/cortecs/Meta-Llama-3-8B-Instruct-GPTQ)__ \|
	\| Avg. \| 61.42 \| 61.42 \| 61.53 \|
	\| RealToxicityPrompts \| 97.2 \| 97.2 \| 97.2 \|
	\| TruthfulQA \| 51.65 \| 51.58 \| 51.98 \|
	\| CrowS \| 35.42 \| 35.48 \| 35.42 \|
	\| \| \| \| \|
	\| __Spanish__ \| __[Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)__ \| __[Meta-Llama-3-8B-Instruct-GPTQ-8b](https://huggingface.co/cortecs/Meta-Llama-3-8B-Instruct-GPTQ-8b)__ \| __[Meta-Llama-3-8B-Instruct-GPTQ](https://huggingface.co/cortecs/Meta-Llama-3-8B-Instruct-GPTQ)__ \|
	\| Avg. \| 59 \| 58.63 \| 54.6 \|
	\| ARC_es \| 54.1 \| 53.8 \| 46.9 \|
	\| Hellaswag_es \| 63.8 \| 63.3 \| 60.3 \|
	\| MMLU_es \| 59.1 \| 58.8 \| 56.6 \|

	We did not check for data contamination.
	Evaluation was done using [Eval. Harness](https://github.com/EleutherAI/lm-evaluation-harness) using `limit=1000`.

	## Performance
	\| \| requests/s \| tokens/s \|
	\|:------------\|-------------:\|-----------:\|
	\| NVIDIA L4x1 \| 3.96 \| 1887.55 \|
	\| NVIDIA L4x2 \| 4.87 \| 2323.34 \|
	\| NVIDIA L4x4 \| 5.61 \| 2674.18 \|
	Performance measured on [cortecs inference](https://cortecs.ai).