redrix's picture
Update README.md
d0a1b4b verified
|
raw
history blame
2.76 kB
metadata
base_model:
  - inflatebot/MN-12B-Mag-Mell-R1
  - TheDrummer/UnslopNemo-12B-v4
library_name: transformers
tags:
  - mergekit
  - merge
  - 12b
  - chat
  - roleplay
  - creative-writing
  - NuSLERP
license: apache-2.0

patricide-12B-Unslop-Mell-v2

The sins of the Father shan't ever be repeated this way.

PatricideLogo.png

This is a merge of pre-trained language models created using mergekit.

This is my seventh model. I decided to use TheDrummer/UnslopNemo-12B-v4 instead of TheDrummer/UnslopNemo-12B-v4.1 as it supposedly has more anti-GPTism influence at the cost of intelligence, so I'll be using it in future merges. It could most likely be counteracted by adding more intelligent models. TheDrummer said that Metharme/Pygmalion templates have higher anti-GPTism effect, but those specific tokens aren't enforced/present in the tokenizer, and I prefer ChatML. Thusly I picked the model that has more anti-GPTism influence in it's base state. I decided to tweak the parameters to be more balanced, while also just generally testing NuSLERP. If I find better parameters I might release a V2B of some kind. I still haven't had much time to test this exhaustively and I'm also working on other projects.

Testing stage: early testing

I do not know how this model holds up over long term context. Early testing showed stability and viable answers.

Parameters

  • Context size: Not more than 20k recommended - coherency may degrade.
  • Chat Template: ChatML; Metharme/Pygmalion (as per UnslopNemo) may work, but effects are untested
  • Samplers: A Temperature-Last of 1 and Min-P of 0.1 are viable, but haven't been finetuned. Activate DRY if repetition appears. XTC is untested.

Quantization

Soon...

Merge Details

Merge Method

This model was merged using the NuSLERP merge method.

Models Merged

The following models were included in the merge:

Configuration

The following YAML configuration was used to produce this model:

models:
  - model: TheDrummer/UnslopNemo-12B-v4
    parameters:
      weight: [0.6, 0.5, 0.3, 0.5, 0.6]
  - model: inflatebot/MN-12B-Mag-Mell-R1
    parameters:
      weight: [0.4, 0.5, 0.7, 0.5, 0.4]
merge_method: nuslerp
dtype: bfloat16
chat_template: "chatml"
tokenizer:
  source: union
parameters:
  normalize: true
  int8_mask: true