Update tokenizer

c443608 verified 6 months ago

13.1 kB

	---
	license: other
	datasets: wikitext
	license_link: https://llama.meta.com/llama3/license/
	---
	This is a quantized model of [Meta-Llama-3-70B-Instruct.yaml](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct.yaml) using GPTQ developed by [IST Austria](https://ist.ac.at/en/research/alistarh-group/)
	using the following configuration:
	- 4bit
	- Act order: True
	- Group size: 128

	## Usage
	Install vLLM and
	run the [server](https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html#openai-compatible-server):

	```
	python -m vllm.entrypoints.openai.api_server --model cortecs/Meta-Llama-3-70B-Instruct-GPTQ
	```
	Access the model:
	```
	curl http://localhost:8000/v1/completions -H "Content-Type: application/json" -d ' {
	"model": "cortecs/Meta-Llama-3-70B-Instruct-GPTQ",
	"prompt": "San Francisco is a"
	} '
	```

	## Evaluations
	\| __English__ \| __[Meta-Llama-3-70B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct)__ \| __[Meta-Llama-3-70B-Instruct-GPTQ-8b](https://huggingface.co/cortecs/Meta-Llama-3-70B-Instruct-GPTQ-8b)__ \| __[Meta-Llama-3-70B-Instruct-GPTQ](https://huggingface.co/cortecs/Meta-Llama-3-70B-Instruct-GPTQ)__ \|
	\|:--------------\|:-----------------------------------------------------------------------------------------------\|:------------------------------------------------------------------------------------------------------------\|:------------------------------------------------------------------------------------------------------\|
	\| Avg. \| 76.19 \| 76.16 \| 75.14 \|
	\| ARC \| 71.6 \| 71.4 \| 70.7 \|
	\| Hellaswag \| 77.3 \| 77.1 \| 76.4 \|
	\| MMLU \| 79.66 \| 79.98 \| 78.33 \|
	\| \| \| \| \|
	\| __French__ \| __[Meta-Llama-3-70B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct)__ \| __[Meta-Llama-3-70B-Instruct-GPTQ-8b](https://huggingface.co/cortecs/Meta-Llama-3-70B-Instruct-GPTQ-8b)__ \| __[Meta-Llama-3-70B-Instruct-GPTQ](https://huggingface.co/cortecs/Meta-Llama-3-70B-Instruct-GPTQ)__ \|
	\| Avg. \| 70.97 \| 71.03 \| 70.27 \|
	\| ARC_fr \| 65.0 \| 65.3 \| 64.7 \|
	\| Hellaswag_fr \| 72.4 \| 72.4 \| 71.4 \|
	\| MMLU_fr \| 75.5 \| 75.4 \| 74.7 \|
	\| \| \| \| \|
	\| __German__ \| __[Meta-Llama-3-70B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct)__ \| __[Meta-Llama-3-70B-Instruct-GPTQ-8b](https://huggingface.co/cortecs/Meta-Llama-3-70B-Instruct-GPTQ-8b)__ \| __[Meta-Llama-3-70B-Instruct-GPTQ](https://huggingface.co/cortecs/Meta-Llama-3-70B-Instruct-GPTQ)__ \|
	\| Avg. \| 68.43 \| 68.37 \| 66.93 \|
	\| ARC_de \| 64.2 \| 64.3 \| 62.6 \|
	\| Hellaswag_de \| 67.8 \| 67.7 \| 66.7 \|
	\| MMLU_de \| 73.3 \| 73.1 \| 71.5 \|
	\| \| \| \| \|
	\| __Italian__ \| __[Meta-Llama-3-70B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct)__ \| __[Meta-Llama-3-70B-Instruct-GPTQ-8b](https://huggingface.co/cortecs/Meta-Llama-3-70B-Instruct-GPTQ-8b)__ \| __[Meta-Llama-3-70B-Instruct-GPTQ](https://huggingface.co/cortecs/Meta-Llama-3-70B-Instruct-GPTQ)__ \|
	\| Avg. \| 70.17 \| 70.43 \| 68.63 \|
	\| ARC_it \| 64.0 \| 64.3 \| 62.1 \|
	\| Hellaswag_it \| 72.6 \| 72.4 \| 71.0 \|
	\| MMLU_it \| 73.9 \| 74.6 \| 72.8 \|
	\| \| \| \| \|
	\| __Safety__ \| __[Meta-Llama-3-70B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct)__ \| __[Meta-Llama-3-70B-Instruct-GPTQ-8b](https://huggingface.co/cortecs/Meta-Llama-3-70B-Instruct-GPTQ-8b)__ \| __[Meta-Llama-3-70B-Instruct-GPTQ](https://huggingface.co/cortecs/Meta-Llama-3-70B-Instruct-GPTQ)__ \|
	\| Avg. \| 64.28 \| 64.17 \| 63.64 \|
	\| RealToxicityPrompts \| 97.9 \| 97.8 \| 98.1 \|
	\| TruthfulQA \| 61.91 \| 61.67 \| 59.91 \|
	\| CrowS \| 33.04 \| 33.04 \| 32.92 \|
	\| \| \| \| \|
	\| __Spanish__ \| __[Meta-Llama-3-70B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct)__ \| __[Meta-Llama-3-70B-Instruct-GPTQ-8b](https://huggingface.co/cortecs/Meta-Llama-3-70B-Instruct-GPTQ-8b)__ \| __[Meta-Llama-3-70B-Instruct-GPTQ](https://huggingface.co/cortecs/Meta-Llama-3-70B-Instruct-GPTQ)__ \|
	\| Avg. \| 72.5 \| 72.7 \| 71.3 \|
	\| ARC_es \| 66.7 \| 66.9 \| 65.7 \|
	\| Hellaswag_es \| 75.8 \| 75.9 \| 74 \|
	\| MMLU_es \| 75 \| 75.3 \| 74.2 \|

	We did not check for data contamination.
	Evaluation was done using [Eval. Harness](https://github.com/EleutherAI/lm-evaluation-harness) using `limit=1000`.

	## Performance
	\| \| requests/s \| tokens/s \|
	\|:--------------\|-------------:\|-----------:\|
	\| NVIDIA L40Sx2 \| 2 \| 951.28 \|
	Performance measured on [cortecs inference](https://cortecs.ai).

	---
	license: other
	datasets: wikitext
	license_link: https://llama.meta.com/llama3/license/
	---
	This is a quantized model of [Meta-Llama-3-70B-Instruct.yaml](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct.yaml) using GPTQ developed by [IST Austria](https://ist.ac.at/en/research/alistarh-group/)
	using the following configuration:
	- 4bit
	- Act order: True
	- Group size: 128

	## Usage
	Install vLLM and
	run the [server](https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html#openai-compatible-server):

	```
	python -m vllm.entrypoints.openai.api_server --model cortecs/Meta-Llama-3-70B-Instruct-GPTQ
	```
	Access the model:
	```
	curl http://localhost:8000/v1/completions -H "Content-Type: application/json" -d ' {
	"model": "cortecs/Meta-Llama-3-70B-Instruct-GPTQ",
	"prompt": "San Francisco is a"
	} '
	```

	## Evaluations
	\| __English__ \| __[Meta-Llama-3-70B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct)__ \| __[Meta-Llama-3-70B-Instruct-GPTQ-8b](https://huggingface.co/cortecs/Meta-Llama-3-70B-Instruct-GPTQ-8b)__ \| __[Meta-Llama-3-70B-Instruct-GPTQ](https://huggingface.co/cortecs/Meta-Llama-3-70B-Instruct-GPTQ)__ \|
	\|:--------------\|:-----------------------------------------------------------------------------------------------\|:------------------------------------------------------------------------------------------------------------\|:------------------------------------------------------------------------------------------------------\|
	\| Avg. \| 76.19 \| 76.16 \| 75.14 \|
	\| ARC \| 71.6 \| 71.4 \| 70.7 \|
	\| Hellaswag \| 77.3 \| 77.1 \| 76.4 \|
	\| MMLU \| 79.66 \| 79.98 \| 78.33 \|
	\| \| \| \| \|
	\| __French__ \| __[Meta-Llama-3-70B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct)__ \| __[Meta-Llama-3-70B-Instruct-GPTQ-8b](https://huggingface.co/cortecs/Meta-Llama-3-70B-Instruct-GPTQ-8b)__ \| __[Meta-Llama-3-70B-Instruct-GPTQ](https://huggingface.co/cortecs/Meta-Llama-3-70B-Instruct-GPTQ)__ \|
	\| Avg. \| 70.97 \| 71.03 \| 70.27 \|
	\| ARC_fr \| 65.0 \| 65.3 \| 64.7 \|
	\| Hellaswag_fr \| 72.4 \| 72.4 \| 71.4 \|
	\| MMLU_fr \| 75.5 \| 75.4 \| 74.7 \|
	\| \| \| \| \|
	\| __German__ \| __[Meta-Llama-3-70B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct)__ \| __[Meta-Llama-3-70B-Instruct-GPTQ-8b](https://huggingface.co/cortecs/Meta-Llama-3-70B-Instruct-GPTQ-8b)__ \| __[Meta-Llama-3-70B-Instruct-GPTQ](https://huggingface.co/cortecs/Meta-Llama-3-70B-Instruct-GPTQ)__ \|
	\| Avg. \| 68.43 \| 68.37 \| 66.93 \|
	\| ARC_de \| 64.2 \| 64.3 \| 62.6 \|
	\| Hellaswag_de \| 67.8 \| 67.7 \| 66.7 \|
	\| MMLU_de \| 73.3 \| 73.1 \| 71.5 \|
	\| \| \| \| \|
	\| __Italian__ \| __[Meta-Llama-3-70B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct)__ \| __[Meta-Llama-3-70B-Instruct-GPTQ-8b](https://huggingface.co/cortecs/Meta-Llama-3-70B-Instruct-GPTQ-8b)__ \| __[Meta-Llama-3-70B-Instruct-GPTQ](https://huggingface.co/cortecs/Meta-Llama-3-70B-Instruct-GPTQ)__ \|
	\| Avg. \| 70.17 \| 70.43 \| 68.63 \|
	\| ARC_it \| 64.0 \| 64.3 \| 62.1 \|
	\| Hellaswag_it \| 72.6 \| 72.4 \| 71.0 \|
	\| MMLU_it \| 73.9 \| 74.6 \| 72.8 \|
	\| \| \| \| \|
	\| __Safety__ \| __[Meta-Llama-3-70B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct)__ \| __[Meta-Llama-3-70B-Instruct-GPTQ-8b](https://huggingface.co/cortecs/Meta-Llama-3-70B-Instruct-GPTQ-8b)__ \| __[Meta-Llama-3-70B-Instruct-GPTQ](https://huggingface.co/cortecs/Meta-Llama-3-70B-Instruct-GPTQ)__ \|
	\| Avg. \| 64.28 \| 64.17 \| 63.64 \|
	\| RealToxicityPrompts \| 97.9 \| 97.8 \| 98.1 \|
	\| TruthfulQA \| 61.91 \| 61.67 \| 59.91 \|
	\| CrowS \| 33.04 \| 33.04 \| 32.92 \|
	\| \| \| \| \|
	\| __Spanish__ \| __[Meta-Llama-3-70B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct)__ \| __[Meta-Llama-3-70B-Instruct-GPTQ-8b](https://huggingface.co/cortecs/Meta-Llama-3-70B-Instruct-GPTQ-8b)__ \| __[Meta-Llama-3-70B-Instruct-GPTQ](https://huggingface.co/cortecs/Meta-Llama-3-70B-Instruct-GPTQ)__ \|
	\| Avg. \| 72.5 \| 72.7 \| 71.3 \|
	\| ARC_es \| 66.7 \| 66.9 \| 65.7 \|
	\| Hellaswag_es \| 75.8 \| 75.9 \| 74 \|
	\| MMLU_es \| 75 \| 75.3 \| 74.2 \|

	We did not check for data contamination.
	Evaluation was done using [Eval. Harness](https://github.com/EleutherAI/lm-evaluation-harness) using `limit=1000`.

	## Performance
	\| \| requests/s \| tokens/s \|
	\|:--------------\|-------------:\|-----------:\|
	\| NVIDIA L40Sx2 \| 2 \| 951.28 \|
	Performance measured on [cortecs inference](https://cortecs.ai).