Summary
This is a 4bit quantised openlm-research/open_llama_13b using GPTQ-for-LLaMa.
The quantization command was: python ./GPTQ-for-LLaMa/llama.py ./open_llama_13b c4 --wbits 4 --true-sequential --groupsize 128 --save open-llama-13b-4bit-128g.pt
Original model readme is below.
OpenLLaMA: An Open Reproduction of LLaMA
In this repo, we present a permissively licensed open source reproduction of Meta AI's LLaMA large language model. We are releasing 3B, 7B and 13B models trained on 1T tokens. We provide PyTorch and JAX weights of pre-trained OpenLLaMA models, as well as evaluation results and comparison against the original LLaMA models. Please see the project homepage of OpenLLaMA for more details. (continue at https://huggingface.co/openlm-research/open_llama_13b)
- Downloads last month
- 12
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.