Kooten
/

FlatDolphinMaid-8x7B-4bpw-exl2

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Edit model card

FlatDolphinMaid-8x7B 4bpw

Exllama quant of Undi95/FlatDolphinMaid-8x7B

You probably want the 3.5bpw version. It just fits in 24gb of vram at half context (16384).

If you really want the larger context 3bpw should do it but you are probably better of with the gguf version with higher quants.

I did make a 4bpw, it might work in a headless or multigpu setup.

Other BPW's 3.0bpw, 3.5bpw, 4.0bpw

Make sure you enable 8bit cache.

Promt format:

### Instruction:
{system prompt}

### Input:
{input}

### Response:
{reply}

Contact

Kooten on discord.

Downloads last month: 9

Inference Examples

Text Generation

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.