File size: 3,067 Bytes
e5ea2b6
 
8b25291
 
 
 
 
 
 
 
 
 
 
 
f832a4e
e5ea2b6
582db83
 
e5ea2b6
 
 
 
 
 
 
 
 
8b25291
e5ea2b6
8b25291
e5ea2b6
 
 
 
 
8b25291
 
 
e5ea2b6
 
 
 
 
 
 
 
8b25291
e5ea2b6
 
 
 
 
8b25291
 
e5ea2b6
8b25291
 
e5ea2b6
f832a4e
 
 
 
 
 
8b25291
e5ea2b6
 
 
 
 
 
 
 
 
8b25291
 
 
 
 
 
 
 
 
 
 
 
e5ea2b6
8b25291
 
e5ea2b6
8b25291
e5ea2b6
8b25291
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
---
license: apache-2.0
language:
- en
tags:
- creative
- story
- writing
- fiction
- float32
- roleplaying
- rp
- enhanced
- space whale
- 32 bit upscale
---
[quants uploading in progress]

<font color=red><h3> Ultra High Remaster of the incredible: Psyonic-Cetacean-20b. </h3></font>

This is a Floating Point 32 upscale, where all components and merges were remastered to floating point 32.
This includes all the merges (recreated with master files), and where possible subbing full FP32 models.

The goal: Carry forward maximum precision right up to the point where it is "GUFFed".

This includes F32 master file for GGUF too... at a whopping 78 GBs.

WHY?

Because the difference between F32 vs BF16 is... over 8 DECIMAL places.

And as each merge / model is modified there are "losses" along the way.

These losses are carried forward and in turn lead to more losses.

And decimal points are critical to model performance.

SMALL?

Yes... but multipled by each merge(s), and compression(s): 20 billion times.

<B>The result:</b>

At Q2K an impressive drop of 533 points in perplexity. (lower is better)
(VS: Q2K orginal base model: PPL = 9.8077 +/- 0.06821 )

At Q4KM a whopping drop of 976 points in perplexity.
(VS: Q4km orginal base model -> PPL = 8.7858 +/- 0.06074)

At Q6 an awesome drop of 234 points in perplexity. 
(VS: Q6 orginal base model -> PPL = 8.6070 +/- 0.05907 )

To put this in perspective "Q6" now operates ABOVE the orginal full precision version of "Psyonic-Cetacean-20b" 
and Q4KM operates at close to Q6 level quality.

This because at "Q6" the quant / compressed model is considered to be accurate within "+0.0008 ppl" of the full, 
uncompressed / unquanted model and it exceeds this threshold by over 200 points.

But... what about Q8? 

The mountain moved:

150 points better: PPL = 8.5850 +/- 0.05881  VS: BASE/ORGINAL: PPL = 8.6012 +/- 0.05900

<B>The bottom line here is this:</b>

Higher quality instruction following and output.

Likewise you can use a smaller compression, with higher token per second and still get great quality.

Same great model... turbo charged.

This is the first group of remasters.

<B>The FOUR Horsemen:</B>

This repo will be followed by a "reg quant plus" repo, which added additional components into the GGUF (all levels) at floating point 32 
precision to further increase the sheer creativity and raw AI horsepower.

This process shaves at extra 50-100 points off perplexity... again.

Following this group will be a full float 32 precision Imatrix (including reg quants "imatrixed").

Test results VS org and "ultra" regular quants will be posted when they come in.

Imatrix Plus repo (with the same floating 32 enhancement at "reg quant plus") that will push the limit even more.

Details of all methods (and pitfalls to avoid) employed to make this high precision remasters will be 
posted shortly along with comparsions of orginal model and new ultra remaster.

Thanks again to Jeb Carter, the orginal creator of "Psyonic-Cetacean 20B"

[ https://huggingface.co/jebcarter/psyonic-cetacean-20B ]