File size: 5,283 Bytes
d67115a
 
 
 
 
 
 
 
 
 
ab60f5b
 
 
 
d6af32d
1523f26
dafee3a
d67115a
 
ab60f5b
d67115a
 
 
ab60f5b
 
07befe3
ab60f5b
e23d984
4866929
ab60f5b
 
 
07befe3
ab60f5b
 
 
 
7c08f06
 
d67115a
 
 
4866929
d67115a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ab60f5b
 
f2d204c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
---
base_model:
- inflatebot/MN-12B-Mag-Mell-R1
- TheDrummer/UnslopNemo-12B-v4.1
- ArliAI/Mistral-Nemo-12B-ArliAI-RPMax-v1.2
- DavidAU/MN-GRAND-Gutenberg-Lyra4-Lyra-12B-DARKNESS
library_name: transformers
tags:
- mergekit
- merge
- 12b
- chat
- roleplay
- creative-writing
- DELLA-linear
license: apache-2.0
new_version: redrix/AngelSlayer-12B-Unslop-Mell-RPMax-DARKNESS-v2
---
# AngelSlayer-12B-Unslop-Mell-RPMax-DARKNESS
> They say ‘He’ will bring the apocalypse. <span style="color:darkred">She</span> seeks understanding, not destruction.

This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).

This is my fourth model. I wanted to test *della_linear*. The point of this model was to use the negative properties of [DavidAU/MN-GRAND-Gutenberg-Lyra4-Lyra-12B-DARKNESS](https://huggingface.co/DavidAU/MN-GRAND-Gutenberg-Lyra4-Lyra-12B-DARKNESS) to counter potential positivity bias while keeping up stability. 
## Testing stage: testing
**(18/12/2024):** The model seems to hold up very well over context, and keeps to the character/prompt nicely. It has expansive, varied prose, lacking GPTisms mostly. The only problem is that the model always seems to interpret the input in a similar manner (probably due to *self_attn* layers). Thusly the output always tends to follow a certain theme/direction, even if the wording is different per swipe (the longer the response, the more it'll deviate from this set direction at the beginning). A peculiar quirk is that errors are predictable - if the model writes the name of the user in a wrong manner (scrambling letters, etc; I myself have a more complex name), it will ALWAYS missspell that instance of the name in consequent swipes. But it automatically fixes itself. If the first instance of the name is spelt wrong, further instances will be fixed, though. Repetition is low, and *DRY* can help if it does appear. But I've not had it pick up on any patterns. *Higher Temperature* (1.25) seems to work better. Sometimes it gives quite the impressive answers. *XTC* can improve it a lot, without decreasing intelligence - but I've not really defined the difference between responses via *neutralized sampler* answers and *XTC*. If you find that the model gives bogus on swipes, add some characters at the end of your input to sort-of scramble the output (add some asterisks or whatever; or write some useless extra sentence if you so desire).

**EDIT:** This 'theme' of swipes being similar seems to be an issue with [inflatebot/MN-12B-Mag-Mell-R1](https://huggingface.co/inflatebot/MN-12B-Mag-Mell-R1). Perhaps I'll reduce the weight of it/balance it with ArliAI/Mistral-Nemo-12B-ArliAI-RPMax-v1.2](https://huggingface.co/ArliAI/Mistral-Nemo-12B-ArliAI-RPMax-v1.2) by putting that as the last model (the model order matters with *DELLA-Linear*, 'lower' models in the config hold more prevalence). Perhaps I can experiment with using the base models that [inflatebot/MN-12B-Mag-Mell-R1](https://huggingface.co/inflatebot/MN-12B-Mag-Mell-R1) utilizes to perhaps remerge the whole model to try to alleviate this issue via different merge methods.

## Parameters
- **Context size:** Not more than *20k* recommended - coherency may degrade.
- **Chat Template:** *ChatML*
- **Samplers:** A *Temperature-Last* of 1-1.25 and *Min-P* of 0.1-0.25 are viable, but haven't been finetuned. Activate *DRY* if repetition appears. *XTC* seems to work well.

## Quantization
- Static **GGUF** Quants available at [mradermacher/AngelSlayer-12B-Unslop-Mell-RPMax-DARKNESS-GGUF](https://huggingface.co/mradermacher/AngelSlayer-12B-Unslop-Mell-RPMax-DARKNESS-GGUF)
- iMatrix Quants available at [mradermacher/AngelSlayer-12B-Unslop-Mell-RPMax-DARKNESS-i1-GGUF](https://huggingface.co/mradermacher/AngelSlayer-12B-Unslop-Mell-RPMax-DARKNESS-i1-GGUF)
❤️ Thanks.

## Merge Details
### Merge Method

This model was merged using the della_linear merge method using [TheDrummer/UnslopNemo-12B-v4.1](https://huggingface.co/TheDrummer/UnslopNemo-12B-v4.1) as a base. 

### Models Merged

The following models were included in the merge:
* [inflatebot/MN-12B-Mag-Mell-R1](https://huggingface.co/inflatebot/MN-12B-Mag-Mell-R1)
* [ArliAI/Mistral-Nemo-12B-ArliAI-RPMax-v1.2](https://huggingface.co/ArliAI/Mistral-Nemo-12B-ArliAI-RPMax-v1.2)
* [DavidAU/MN-GRAND-Gutenberg-Lyra4-Lyra-12B-DARKNESS](https://huggingface.co/DavidAU/MN-GRAND-Gutenberg-Lyra4-Lyra-12B-DARKNESS)

### Configuration

The following YAML configuration was used to produce this model:

```yaml
models:
  - model: TheDrummer/UnslopNemo-12B-v4.1
    parameters:
      weight: 0.25
      density: 0.6
  - model: ArliAI/Mistral-Nemo-12B-ArliAI-RPMax-v1.2
    parameters:
      weight: 0.25
      density: 0.6
  - model: DavidAU/MN-GRAND-Gutenberg-Lyra4-Lyra-12B-DARKNESS
    parameters:
      weight: 0.2
      density: 0.4
  - model: inflatebot/MN-12B-Mag-Mell-R1
    parameters:
      weight: 0.30
      density: 0.7
base_model: TheDrummer/UnslopNemo-12B-v4.1
merge_method: della_linear
dtype: bfloat16
chat_template: "chatml"
tokenizer_source: union
parameters:
  normalize: false
  int8_mask: true
  epsilon: 0.05
  lambda: 1

```

> [Today we hustle, 'day we hustle but tonight we play.](https://www.youtube.com/watch?v=-UjA03imoNI)