|
--- |
|
license: mit |
|
library_name: adapter-transformers |
|
--- |
|
Effi-13B AWQ is a quantization model of our [Effi-13B](https://huggingface.co/aiplanet/effi-13b) a reasoning model. |
|
|
|
## About AWQ |
|
|
|
AWQ is an efficient, accurate and blazing-fast low-bit weight quantization method, currently supporting 4-bit quantization. Compared to GPTQ, it offers faster Transformers-based inference. |
|
|
|
It is also now supported by continuous batching server vLLM, allowing use of AWQ models for high-throughput concurrent inference in multi-user server scenarios. |
|
|
|
effi-13B parameters is a causal decoder-only model built by AI Planet based on Llama-2-13b-chat-hf and fine tuned using the 1.8 Million coversations from CoT dataset available in huggingface datasets. The model is made available under the Apache 2.0 license. |
|
|
|
## Why use effi-13B-Instruct? |
|
|
|
- This is a ready to use chat/instruct model based on Llama-2-13b-chat-hf, which provides a rationale for the context provided. |
|
- Llama-2 is the best open-source model available. This is an instruct model, which may not be ideal for further finetuning. If you are interested in building your own instruct/chat model, we recommend starting from Llama-2-13b-chat-hf |
|
You will need at least 85-100GB of memory to run inference with effi-13b swiftly. |
|
|
|
## Our benchmarking |
|
|
|
| Metric | Value | |
|
|--------------------|---------| |
|
| Perplexity | 5.529 | |
|
| MMLU | 50.90 | |
|
| Hella Swag (acc) | 59.38 | |
|
| Hella Swag (acc_norm) | 78.91 | |
|
| TruthfulQA | 38.24 | |
|
|
|
## Direct Use |
|
|
|
effi-13b has been finetuned on a Chain of Thought dataset. |
|
|
|
## Out-of-Scope Use |
|
|
|
Production use without adequate assessment of risks and mitigation; any use cases which may be considered irresponsible or harmful. |
|
|
|
## Bias, Risks, and Limitations |
|
|
|
This model has been majorly trained on English data, and will not generalize appropriately to other languages. Furthermore, as it is trained on a large-scale corpora representative of the web, it will carry the stereotypes and biases commonly encountered online. |
|
|
|
## Recommendations |
|
|
|
We recommend users of effi-13b to develop guardrails and take appropriate precautions for any production use. |
|
|
|
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information is needed for further recommendations. |
|
|
|
## Citations |
|
|
|
``` |
|
@misc {lucifertrj, |
|
author = { {Tarun Jain} }, |
|
title = { Effi-13B-AWQ by AI Planet}, |
|
year = 2024, |
|
url = { https://huggingface.co/aiplanet/effi-13B-AWQ/ }, |
|
publisher = { Hugging Face } |
|
} |
|
``` |