readme linked up with schedule, calibration info, some attempt at reproducibility, nonsense
Browse files
README.md
CHANGED
@@ -3,18 +3,38 @@ license: other
|
|
3 |
license_name: yi-license
|
4 |
license_link: https://huggingface.co/01-ai/Yi-34B-200K/blob/main/LICENSE
|
5 |
tags:
|
6 |
-
- merge
|
|
|
|
|
|
|
7 |
---
|
8 |
-
|
|
|
9 |
### NOTICE: I did not use the original file! I started with Q6_K (there was no Q8)
|
10 |
#### There may well be problems with these quants but I'll eat my own ass if a 57B Q6_K (>6.5bpw) is the root of any of them. More suspect is how I produced the imatrix.
|
11 |
-
imatrix included. generated from a 900k text file, also included
|
12 |
-
this file was made by concatenating most of the default exllamav2 calibration data. a 900kb file of coherent text only, with some formatting and code but no endless broken html tags or nonsense. includes multilingual, for those deep layers.
|
|
|
|
|
|
|
|
|
|
|
|
|
13 |
|
|
|
|
|
|
|
|
|
|
|
|
|
14 |
|
15 |
[IQ2_XS](./Kyllene-57B-v1.0.IQ2_XS.gguf/) 2.38 BPW `CUDA0 buffer size = 15941.43 MiB`
|
16 |
- This file only exists because I did the maths wrong (I was expecting it to be bigger), but I recall that 16GB GPUs exist and I may give it a go with stable diffusion
|
17 |
|
18 |
-
|
|
|
|
|
|
|
|
|
19 |
|
20 |
-
[IQ3_XXS](./Kyllene-57B-v1.0.IQ3_XXS.gguf/) (<3.1 BPW)
|
|
|
3 |
license_name: yi-license
|
4 |
license_link: https://huggingface.co/01-ai/Yi-34B-200K/blob/main/LICENSE
|
5 |
tags:
|
6 |
+
- merge
|
7 |
+
- GGUF
|
8 |
+
- imatrix
|
9 |
+
- 2bit
|
10 |
---
|
11 |
+
# Kyllene-57B
|
12 |
+
[Kyllene-57B](TeeZee/Kyllene-57B-v1.0) quantized to 2~3 bpw GGUF
|
13 |
### NOTICE: I did not use the original file! I started with Q6_K (there was no Q8)
|
14 |
#### There may well be problems with these quants but I'll eat my own ass if a 57B Q6_K (>6.5bpw) is the root of any of them. More suspect is how I produced the imatrix.
|
15 |
+
[imatrix included.](./Kyllene-57B-v1.0.q6_K.gguf.imatrix) generated from [a 900k text file, also included](./techmulcodetiny.utf8)
|
16 |
+
this file was made by concatenating most of the [default exllamav2 calibration data](https://github.com/turboderp/exllamav2/tree/master/conversion/standard_cal_data). a 900kb file of coherent text only, with some formatting and code but no endless broken html tags or nonsense. includes multilingual, for those deep layers.
|
17 |
+
artefact produced from:
|
18 |
+
```
|
19 |
+
$ cd exllamav2/conversion/standard_cal_data
|
20 |
+
$ cat technical.utf8 multilingual.utf8 code.utf8 tiny.utf8 >> techmulcodetiny.utf8
|
21 |
+
```
|
22 |
+
where: [exllamav2/conversion/standard_cal_data](https://github.com/turboderp/exllamav2/tree/master/conversion/standard_cal_data) and [techmulcodetiny.utf8](./techmulcodetiny.utf8) produce a file that is used by imatrix for 560~ "chunks"
|
23 |
|
24 |
+
imatrix run with default sampling settings besides the dataset (i think? i increased the batch number and reduced the batch size so i could cram on more layers but the generation should have been the same in the end)
|
25 |
+
(someone tell me why I was wrong to run imatrix with -cb continuous batching. shame me.)
|
26 |
+
|
27 |
+
# Downloads (eventually)
|
28 |
+
|
29 |
+
`upload in progress:`
|
30 |
|
31 |
[IQ2_XS](./Kyllene-57B-v1.0.IQ2_XS.gguf/) 2.38 BPW `CUDA0 buffer size = 15941.43 MiB`
|
32 |
- This file only exists because I did the maths wrong (I was expecting it to be bigger), but I recall that 16GB GPUs exist and I may give it a go with stable diffusion
|
33 |
|
34 |
+
`under reconstruction:`
|
35 |
+
|
36 |
+
[IQ2_M](./Kyllene-57B-v1.0.IQ2_M.gguf/) 2.7 BPW briefly existed before I [clobbered](http://catb.org/jargon/html/C/clobber.html) _(verb, transitory)_ it. It ~might~ will be back.
|
37 |
+
|
38 |
+
`upload scheduled next:`
|
39 |
|
40 |
+
[IQ3_XXS](./Kyllene-57B-v1.0.IQ3_XXS.gguf/) (3.0<s<3.1 BPW)
|