Edit model card
YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

AI Model Name: Llama 3 70B "Built with Meta Llama 3" https://llama.meta.com/llama3/license/

This is the result of running AutoAWQ to quantize the LLaMA-3 70B model to ~4 bits/parameter.

To launch an OpenAI-compatible API endpoint on your Linux server with 2x 3090 or 4090 GPUs:

git lfs install
git clone https://huggingface.co/catid/cat-llama-3-70b-awq-q128-w4-gemm

conda create -n vllm70 python=3.10 -y && conda activate vllm70

pip install -U git+https://github.com/vllm-project/vllm.git

python -m vllm.entrypoints.openai.api_server --model cat-llama-3-70b-awq-q128-w4-gemm --tensor-parallel-size 2 --gpu-memory-utilization 0.935

Sadly this barely doesn't fit by ~300MB or so.

Downloads last month
13
Safetensors
Model size
11.3B params
Tensor type
I32
·
FP16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.