InferenceIllusionist commited on
Commit
f3aba3f
1 Parent(s): c18ffb4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +81 -0
README.md CHANGED
@@ -1,3 +1,84 @@
1
  ---
 
 
 
 
 
 
2
  license: apache-2.0
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ base_model: [ibm/merlinite-7b]
3
+ library_name: transformers
4
+ tags:
5
+ - mergekit
6
+ - merge
7
+ - GGUF
8
  license: apache-2.0
9
  ---
10
+
11
+
12
+ # Excalibur-7b GGUF
13
+
14
+ <img src="https://i.imgur.com/viIO4WT.png" width="550"/>
15
+
16
+ <i>Image generated with Envoid's [Model9](https://huggingface.co/Envoid/model9) SDXL model </i>
17
+
18
+ FP16 can be found [here](https://huggingface.co/InferenceIllusionist/Excalibur-7b)
19
+
20
+ [Magic-Dolphin-7b](https://huggingface.co/InferenceIllusionist/Magic-Dolphin-7b) was an unexpected surprise. Profoundly satisfied with it as a first attempt. For this follow-up I wanted to target the MMLU benchmark specifically.
21
+ The challenge this time was placing more weight on Merlinite-7b as an unknown quantity that hasn't been in the spotlight despite its novel LAB tuning method.
22
+
23
+ <b>Excalibur-7b</b> builds on past success and is the culmination of several learnings:
24
+ * Measuring KL-divergences for new quantization types brought a deeper understanding of benchmarking and assessing model performance
25
+ * This signifcantly sped up the testing process by using MMLU as a base, narrowing down over 10 candidate linear merges to 1: merliniteX-blockB1
26
+ * Reaching the limitations of linear merging necessitated a pivot to reviewing the viability of SLERP, DARE-TIES, and Passthrough methods
27
+ * Thus a competing candidate merge pool was tested between different merge algorithms. Once more the list was narrowed from 10 candidates to 1: merliniteX-blockF2
28
+ * merliniteX-blockF2 (SLERP of Magic-Dolphin-7B and jaskier-7b-dpo in unorthadox proportions) was originally planned for release earlier this week
29
+ * Instead -blockB1 and -blockF2 were merged and the results were placed head to head in a final round of tests. Ultimately a more conventional execution of SLERP showed the best results for the final step.
30
+
31
+
32
+
33
+ # Sample Question
34
+
35
+ <img src="https://i.imgur.com/fdFYIhv.jpeg" width="550"/>
36
+
37
+ # Bonus Question - Vision Capabilities
38
+
39
+ <b>Requires additional [mistral-7b-mmproj-v1.5-Q4_1.gguf](https://huggingface.co/koboldcpp/mmproj/tree/main) file for vision functionality</b>
40
+ <img src="https://i.imgur.com/4wbUrjf.jpeg" width="550"/>
41
+
42
+
43
+
44
+
45
+ This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
46
+
47
+ ## Merge Details
48
+ ### Merge Method
49
+
50
+ This model was merged using the SLERP merge method.
51
+
52
+ ### Models Merged
53
+
54
+ The following models were included in the merge:
55
+ * models/merliniteX-blockB1
56
+ * models/merliniteX-blockF2
57
+
58
+ ### Configuration
59
+
60
+ The following YAML configuration was used to produce this model:
61
+
62
+ ```yaml
63
+ slices:
64
+ - sources:
65
+ - model: models/merliniteX-blockF2
66
+ layer_range: [0, 32]
67
+ - model: models/merliniteX-blockB1
68
+ layer_range: [0, 32]
69
+ # or, the equivalent models: syntax:
70
+ # models:
71
+ # - model: psmathur/orca_mini_v3_13b
72
+ # - model: garage-bAInd/Platypus2-13B
73
+ merge_method: slerp
74
+ base_model: models/merliniteX-blockF2
75
+ parameters:
76
+ t:
77
+ - filter: self_attn
78
+ value: [1, 0.7, 0.3, 0.5, 0]
79
+ - filter: mlp
80
+ value: [0, 0.3, 0.7, 0.5, 1]
81
+ - value: 0.5 # fallback for rest of tensors
82
+ dtype: float16
83
+
84
+ ```