LavaPlanet
commited on
Commit
•
1b7774b
1
Parent(s):
8dc4766
Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,19 @@
|
|
1 |
---
|
2 |
license: llama2
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: llama2
|
3 |
---
|
4 |
+
Another EXL2 version of AlpinDale's https://huggingface.co/alpindale/goliath-120b this one being at 2.64BPW and using the new experimental quant method of exllamav2.
|
5 |
+
|
6 |
+
|
7 |
+
Pippa llama2 Chat was used as the calibration dataset.
|
8 |
+
|
9 |
+
Can be run on two RTX 3090s w/ 24GB vram each.
|
10 |
+
|
11 |
+
Assuming Windows overhead, the following figures should be more or less close enough for estimation of your own use.
|
12 |
+
```yaml
|
13 |
+
2.64BPW @ 4096 ctx
|
14 |
+
Empty Ctx
|
15 |
+
GPU Split:18/24
|
16 |
+
GPU1: 19.8/24
|
17 |
+
GPU2: 21.9/24
|
18 |
+
10~ tk/s
|
19 |
+
```
|