|
--- |
|
license: llama2 |
|
--- |
|
Quants for Sao10K's model WinterGoddess 1.4 70b : https://huggingface.co/Sao10K/WinterGoddess-1.4x-70B-L2 |
|
|
|
With a twist : the model I used come from a third party, and has been tweaked with limarvp3 and a Linear Rope 8 training to go to 32k context (with even better results in rope 4 and rope 2, maybe other lesser ropes as well) |
|
|
|
I don't know who did the job, only that I found this Q4_K_S quant of it hanging around without FP16 : https://huggingface.co/mishima/WinterGoddess-1.4x-limarpv3-70B-L2-32k.GGUF |
|
|
|
So I made a Q8_0 out of it (best way to requantize after), and requantized it in : |
|
|
|
Full offload possible on 48GB VRAM with a huge context size : |
|
|
|
Q3_K_L |
|
|
|
Full offload possible on 36GB VRAM with a variable context size (up to 7168 with Q3_K_M, for example) |
|
|
|
Q3_K_M, Q3_K_S, Q3_K_XS, |
|
IQ3_XXS SOTA (which is equivalent to a Q3_K_S with more context! (filename is partly wrong, ch2500 is the real values)) |
|
Lower quality : Q2_K, Q2_K_S |
|
|
|
Full offload possible on 24GB VRAM with a decent context size. |
|
|
|
IQ2_XS SOTA (filename is partly wrong, b2035 and ch2500 are the real values) |
|
|
|
The higher ch number, the better the quality. |
|
|
|
And a bonus to play with it, my KoboldCPP_-_v1.55.1.b1933_-_Frankenstein from the 21/01/2024 : https://github.com/Nexesenex/kobold.cpp/releases/tag/v1.55.1_b1933 |
|
|
|
----- |
|
|
|
Edit : Due to a poor CPU (i7-6700k) for AI purpose, and only 36GB of VRAM, I remade Q3_K_S and Q2_K with an small iMatrix of ctx 32 with 25 chunks (so, 800 tokens). |
|
And good news, it lowers the perplexity by : |
|
|
|
More than 3% with linear rope 8 (Pos Compress Embeddings) on Q2_K |
|
- WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-Q2_K.gguf,-,wikitext,6.2489,512 |
|
- WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-iMat-c32_ch25-Q2_K.gguf,-,wikitext,6.0482,512 |
|
|
|
More than 2% with linear ropee 4 on Q2_K |
|
- WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-Q2_K.gguf,-,wikitext,4.8859,512 |
|
- WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-iMat-c32_ch25-Q2_K.gguf,-,wikitext,4.7739,512 |
|
|
|
More than 1.5% with linear rope 2 on Q2_K |
|
- WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-Q2_K.gguf,-,wikitext,4.5030,512 |
|
- WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-iMat-c32_ch25-Q2_K.gguf,-,wikitext,4.42,512 |
|
|
|
More than 1% with linear rope 8 on Q3_K_S |
|
- WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-Q3_K_S.gguf,-,wikitext,5.6127,512 |
|
- WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-iMat-c32_ch25-Q3_K_S.gguf,-,wikitext,5.5461,512 |
|
|
|
----- |
|
|
|
Edit : A Q3_K_XS, new quant offered in LlamaCPP, is otw, with a iMatrix of ctx 32 with 2500 chunks (so, 80,000 tokens) |
|
|
|
----- |
|
|
|
Interestingly, linear rope 2.5 (and linear rope 1.6 as well after further testing) is almost without loss compared to linear rope 2, while 3 and 3.2 are quite good. Here are the values with the normal Q2_K : |
|
- Linear rope 2.5 (max context 10240) : WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-iMat-c32_ch25-Q3_K_S.gguf,-,wikitext,4.0509,512 |
|
- Linear rope 2.5 (max context 10240) : WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1952-iMat-c32_ch2500-Q3_K_XS.gguf,-,wikitext,4.2327 |
|
- Linear rope 2.5 (max context 10240) : WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-Q2_K.gguf,-,wikitext,4.5246,512 |
|
- Linear rope 2.5 (max context 10240) : WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-iMat-c32_ch25-Q2_K_S.gguf,-,wikitext,4.6789,512 |
|
|
|
- Linear rope 3 (max context 12288) : WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-Q2_K.gguf,-,wikitext,4.6203,512 |
|
- Linear rope 3.2 (max context 13107) : WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-Q2_K.gguf,-,wikitext,4.6679,512 |
|
|
|
And for the adventurous, linear rope 10 : (max context 40960) : WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-Q2_K.gguf,-,wikitext,7.1577,512 |
|
- Minus 3% With my Q2_K with c32ch25 iMatrix : WinterGoddess-1.4x-limarpv3-70B-L2-32k-Requant-AR-b1924-iMat-c32_ch25-Q2_K.gguf,-,wikitext,6.9405,512 |
|
|
|
So the linear rope, at least on this model, is flexible, and you can lower it to have the best peplexity for your max context. |
|
|
|
All these results are reproducible with lowers deltas between them for Q3_K_S, and I suppose for other quants as well. |
|
|
|
Then, I wonder about applying a NTK rope on the top of it to expend it further, even if it screws with the integrity of numbers in chat). |
|
Multiply a linear rope (2, 4, 8, whatever) by 5888 (Alpha 1.6, or RBF 16119.8), 6144 (Alpha 1.8, or RBF 18168.7) and even 7424 (Alpha 2.2, or RBF 22277). |
|
This to get a further boost in max context size. Ex with Linear 8 with Alpha 1.8/RBF22277 : 8*7424 = 59392. |
|
It's only theorical of course, but worth testing. |
|
|
|
----- |
|
|
|
Original 70b 4k model perplexity : |
|
- WinterGoddess-1.4x-70B-L2.Q3_K_M.gguf,-,wikitext,3.7428,512,PEC1 |
|
|
|
Benchs of the original Q4_K_S quant I found : |
|
|
|
Linear rope 8 10000 |
|
- WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,wikitext,4.2177,4096 |
|
- WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,wikitext,4.1324,6144 |
|
- WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,wikitext,4.3923,2048 |
|
- WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,wikitext,4.4945,1536 |
|
- WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,wikitext,4.6700,1024 |
|
- WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,wikitext,5.2577,512 |
|
- WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,hellaswag,84.5,,400 |
|
|
|
Linear rope 4 10000 |
|
- WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,wikitext,3.5762,2048 |
|
- WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,wikitext,4.1235,512 |
|
- WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,hellaswag,87.25,,400 |
|
|
|
Linear rope 2 10000 |
|
- WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,wikitext,3.3394 *327,2048 |
|
- WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,wikitext,3.8254,512 |
|
- WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,hellaswag,88,,400 |
|
|
|
Linear rope 1 10000 |
|
- WinterGoddess-1.4x-limarpv3-70B-L2-32k.Q4_K_S.gguf,-,hellaswag,85,,400 |