aiplanet
/

effi-13B-AWQ

4-bit precision

Model card Files Files and versions Community

effi-13B-AWQ / README.md

lucifertrj's picture

push model card

7c964f3 verified 9 months ago

|

history blame contribute delete

2.61 kB

	---
	license: mit
	library_name: adapter-transformers
	---
	Effi-13B AWQ is a quantization model of our [Effi-13B](https://huggingface.co/aiplanet/effi-13b) a reasoning model.

	## About AWQ

	AWQ is an efficient, accurate and blazing-fast low-bit weight quantization method, currently supporting 4-bit quantization. Compared to GPTQ, it offers faster Transformers-based inference.

	It is also now supported by continuous batching server vLLM, allowing use of AWQ models for high-throughput concurrent inference in multi-user server scenarios.

	effi-13B parameters is a causal decoder-only model built by AI Planet based on Llama-2-13b-chat-hf and fine tuned using the 1.8 Million coversations from CoT dataset available in huggingface datasets. The model is made available under the Apache 2.0 license.

	## Why use effi-13B-Instruct?

	- This is a ready to use chat/instruct model based on Llama-2-13b-chat-hf, which provides a rationale for the context provided.
	- Llama-2 is the best open-source model available. This is an instruct model, which may not be ideal for further finetuning. If you are interested in building your own instruct/chat model, we recommend starting from Llama-2-13b-chat-hf
	You will need at least 85-100GB of memory to run inference with effi-13b swiftly.

	## Our benchmarking

	\| Metric \| Value \|
	\|--------------------\|---------\|
	\| Perplexity \| 5.529 \|
	\| MMLU \| 50.90 \|
	\| Hella Swag (acc) \| 59.38 \|
	\| Hella Swag (acc_norm) \| 78.91 \|
	\| TruthfulQA \| 38.24 \|

	## Direct Use

	effi-13b has been finetuned on a Chain of Thought dataset.

	## Out-of-Scope Use

	Production use without adequate assessment of risks and mitigation; any use cases which may be considered irresponsible or harmful.

	## Bias, Risks, and Limitations

	This model has been majorly trained on English data, and will not generalize appropriately to other languages. Furthermore, as it is trained on a large-scale corpora representative of the web, it will carry the stereotypes and biases commonly encountered online.

	## Recommendations

	We recommend users of effi-13b to develop guardrails and take appropriate precautions for any production use.

	Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information is needed for further recommendations.

	## Citations

	```
	@misc {lucifertrj,
	author = { {Tarun Jain} },
	title = { Effi-13B-AWQ by AI Planet},
	year = 2024,
	url = { https://huggingface.co/aiplanet/effi-13B-AWQ/ },
	publisher = { Hugging Face }
	}
	```