Edit model card

EXL2 Quantization of Undi95's's MLewd-ReMM-L2-Chat-20B.

Model details

Quantized at 3.18bpw with hb 6. Can run full 4K context on 16GB VRAM. 8.13bpw also available.

Perplexity:

Base = 6.5820

8.13 = 6.5535

3.18 h6 = 6.6928

Dataset = wikitext

Prompt Format

Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{prompt}

### Response:
Downloads last month
11
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Collection including R136a1/MLewd-ReMM-L2-Chat-20B-exl2