Update README.md
Browse files
README.md
CHANGED
@@ -3,4 +3,8 @@ license: other
|
|
3 |
---
|
4 |
5 bit quantization of airoboros 70b 1.4.1, using exllama2.
|
5 |
|
6 |
-
On 2x4090, 3072 ctx seems to work fine with 21.5,22.5 gpu_split and max_attention_size = 1024 ** 2 instead if 2048 ** 2.
|
|
|
|
|
|
|
|
|
|
3 |
---
|
4 |
5 bit quantization of airoboros 70b 1.4.1, using exllama2.
|
5 |
|
6 |
+
On 2x4090, 3072 ctx seems to work fine with 21.5,22.5 gpu_split and max_attention_size = 1024 ** 2 instead if 2048 ** 2.
|
7 |
+
|
8 |
+
4096 may be factible on a single 48GB VRAM GPU (like A6000)
|
9 |
+
|
10 |
+
Tests are welcome.
|