DavidAU commited on
Commit
8b25291
1 Parent(s): 66bfb73

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +39 -12
README.md CHANGED
@@ -1,5 +1,17 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
4
  [quants uploading in progress]
5
 
@@ -12,15 +24,17 @@ The goal: Carry forward maximum precision right up to the point where it is "GUF
12
 
13
  This includes F32 master file for GGUF too... at a whopping 78 GBs.
14
 
15
- Why?
16
 
17
- Because the difference between F32 vs BF16 is... 8 DECIMAL places.
18
 
19
  And as each merge / model is modified there are "losses" along the way.
20
 
21
  These losses are carried forward and in turn lead to more losses.
22
 
23
- Small?
 
 
24
 
25
  Yes... but multipled by each merge(s), and compression(s): 20 billion times.
26
 
@@ -29,17 +43,19 @@ Yes... but multipled by each merge(s), and compression(s): 20 billion times.
29
  At Q2K an impressive drop of 533 points in perplexity. (lower is better)
30
  (VS: Q2K orginal base model: PPL = 9.8077 +/- 0.06821 )
31
 
32
- At Q4KM an incredible drop of 976 points in perplexity.
33
  (VS: Q4km orginal base model -> PPL = 8.7858 +/- 0.06074)
34
 
35
  At Q6 an awesome drop of 234 points in perplexity.
36
  (VS: Q6 orginal base model -> PPL = 8.6070 +/- 0.05907 )
37
 
38
- To put this in perspective "Q6" now operates ABOVE the orginal full precision version of "Psyonic-Cetacean-20b".
 
39
 
40
- This because at "Q6" the quant / compressed model is considered to be accurate within "+0.0008 ppl" of the full, uncompressed / unquanted model and it exceeds this by over 200 points.
 
41
 
42
- <B>The bottom line here is:</b>
43
 
44
  Higher quality instruction following and output.
45
 
@@ -49,11 +65,22 @@ Same great model... turbo charged.
49
 
50
  This is the first group of remasters.
51
 
52
- <B>More Coming soon...</B>
 
 
 
 
 
 
 
 
 
 
 
53
 
54
- It will be followed by a "reg quant plus", which added additional components into the GGUF (all) at floating point 32
55
- precision to further increases the sheer creatity and raw AI horsepower.
56
 
57
- Following this group will be a full precision Imatrix, and Imatrix Plus repo that will push the limit even more.
58
 
59
- Details of all methods employed to make this high precision remasters will be posted shortly along with comparsions of orginal model and new ultra remaster.
 
1
  ---
2
  license: apache-2.0
3
+ language:
4
+ - en
5
+ tags:
6
+ - creative
7
+ - story
8
+ - writing
9
+ - fiction
10
+ - float32
11
+ - roleplaying
12
+ - rp
13
+ - enhanced
14
+ - space whale
15
  ---
16
  [quants uploading in progress]
17
 
 
24
 
25
  This includes F32 master file for GGUF too... at a whopping 78 GBs.
26
 
27
+ WHY?
28
 
29
+ Because the difference between F32 vs BF16 is... over 8 DECIMAL places.
30
 
31
  And as each merge / model is modified there are "losses" along the way.
32
 
33
  These losses are carried forward and in turn lead to more losses.
34
 
35
+ And decimal points are critical to model performance.
36
+
37
+ SMALL?
38
 
39
  Yes... but multipled by each merge(s), and compression(s): 20 billion times.
40
 
 
43
  At Q2K an impressive drop of 533 points in perplexity. (lower is better)
44
  (VS: Q2K orginal base model: PPL = 9.8077 +/- 0.06821 )
45
 
46
+ At Q4KM a whopping drop of 976 points in perplexity.
47
  (VS: Q4km orginal base model -> PPL = 8.7858 +/- 0.06074)
48
 
49
  At Q6 an awesome drop of 234 points in perplexity.
50
  (VS: Q6 orginal base model -> PPL = 8.6070 +/- 0.05907 )
51
 
52
+ To put this in perspective "Q6" now operates ABOVE the orginal full precision version of "Psyonic-Cetacean-20b"
53
+ and Q4KM operates at close to Q6 level quality.
54
 
55
+ This because at "Q6" the quant / compressed model is considered to be accurate within "+0.0008 ppl" of the full,
56
+ uncompressed / unquanted model and it exceeds this threshold by over 200 points.
57
 
58
+ <B>The bottom line here is this:</b>
59
 
60
  Higher quality instruction following and output.
61
 
 
65
 
66
  This is the first group of remasters.
67
 
68
+ <B>The FOUR Horsemen:</B>
69
+
70
+ This repo will be followed by a "reg quant plus" repo, which added additional components into the GGUF (all levels) at floating point 32
71
+ precision to further increase the sheer creativity and raw AI horsepower.
72
+
73
+ This process shaves at extra 50-100 points off perplexity... again.
74
+
75
+ Following this group will be a full float 32 precision Imatrix (including reg quants "imatrixed").
76
+
77
+ Test results VS org and "ultra" regular quants will be posted when they come in.
78
+
79
+ Imatrix Plus repo (with the same floating 32 enhancement at "reg quant plus") that will push the limit even more.
80
 
81
+ Details of all methods (and pitfalls to avoid) employed to make this high precision remasters will be
82
+ posted shortly along with comparsions of orginal model and new ultra remaster.
83
 
84
+ Thanks again to Jeb Carter, the orginal creator of "Psyonic-Cetacean 20B"
85
 
86
+ [ https://huggingface.co/jebcarter/psyonic-cetacean-20B ]