Update README.md
Browse files
README.md
CHANGED
@@ -16,11 +16,13 @@ Quantized using the [cleaned PIPPA](https://huggingface.co/datasets/royallab/PIP
|
|
16 |
|
17 |
- [2.25bpw6h quants](https://huggingface.co/luigi86/magnum-72b-v1-exl2-rpcal/tree/2.25bpw6h) (tested and working on a single RTX 3090 24GiB at 16k context length)
|
18 |
|
19 |
-
- [2.4bpw6h quants](https://huggingface.co/luigi86/magnum-72b-v1-exl2-rpcal/tree/2.4bpw6h) (may not load on 24GiB VRAM machines
|
20 |
|
21 |
- [3.0bpw8h quants](https://huggingface.co/luigi86/magnum-72b-v1-exl2-rpcal/tree/3.0bpw8h)
|
22 |
|
23 |
-
- [4.0bpw8h quants](https://huggingface.co/luigi86/magnum-72b-v1-exl2-rpcal/tree/4.0bpw8h) (tested and working on two 3090s
|
|
|
|
|
24 |
|
25 |
- [4.5bpw8h quants](https://huggingface.co/luigi86/magnum-72b-v1-exl2-rpcal/tree/4.5bpw8h)
|
26 |
|
@@ -29,6 +31,8 @@ Quantized using the [cleaned PIPPA](https://huggingface.co/datasets/royallab/PIP
|
|
29 |
- [8.0bpw8h quants](https://huggingface.co/luigi86/magnum-72b-v1-exl2-rpcal/tree/8.0bpw8h)
|
30 |
|
31 |
|
|
|
|
|
32 |
Other quants available on request, feel free to ask!
|
33 |
|
34 |
|
|
|
16 |
|
17 |
- [2.25bpw6h quants](https://huggingface.co/luigi86/magnum-72b-v1-exl2-rpcal/tree/2.25bpw6h) (tested and working on a single RTX 3090 24GiB at 16k context length)
|
18 |
|
19 |
+
- [2.4bpw6h quants](https://huggingface.co/luigi86/magnum-72b-v1-exl2-rpcal/tree/2.4bpw6h) (may not load on 24GiB VRAM machines!)
|
20 |
|
21 |
- [3.0bpw8h quants](https://huggingface.co/luigi86/magnum-72b-v1-exl2-rpcal/tree/3.0bpw8h)
|
22 |
|
23 |
+
- [4.0bpw8h quants](https://huggingface.co/luigi86/magnum-72b-v1-exl2-rpcal/tree/4.0bpw8h) (tested and working on two 3090s at 32k context/cache)
|
24 |
+
|
25 |
+
- [4.4bpw8h quants](https://huggingface.co/luigi86/magnum-72b-v1-exl2-rpcal/tree/4.4bpw8h) (tested and working on two 3090s at 32k context, 64k Q4 cache (for CFG or parallelism) with tabbyAPI)
|
26 |
|
27 |
- [4.5bpw8h quants](https://huggingface.co/luigi86/magnum-72b-v1-exl2-rpcal/tree/4.5bpw8h)
|
28 |
|
|
|
31 |
- [8.0bpw8h quants](https://huggingface.co/luigi86/magnum-72b-v1-exl2-rpcal/tree/8.0bpw8h)
|
32 |
|
33 |
|
34 |
+
All tests performed on a headless Linux instance with no active desktop environment to maximize VRAM.
|
35 |
+
|
36 |
Other quants available on request, feel free to ask!
|
37 |
|
38 |
|