metadata

license: cc-by-nc-4.0
base_model:
  - Netrve/Miqu-PlayMaid-70B-v0.1
  - ShinojiResearch/Senku-70B
library_name: transformers
tags:
  - not-for-all-audiences
  - nsfw
  - mergekit
  - merge

aranea-tenebris-120b-v1.0

aka Netrve/Miqu-PlayMaid-70B-v0.1 + ShinojiResearch/Senku-70B
Model merge for uncensored creative writing and rp

A mergekit frankenmerge based on Netrve/Miqu-PlayMaid-70B-v0.1 with interleaved layers of ShinojiResearch/Senku-70B.
This was the top performing model from a second series of merge experiments to create a highly coherant creative writing and rp model.
Tests consisted of a series of private DnD scenario benchmarks, with manual comparison of the most promising merges.

A number of different base models, interleave models and layer offsets were compared.
This model outperformed a number of other popular 70B+ models and merges in both creativity and coherancy tests. It was (briefly) compared to Mixtral 8x22B running 2/3/4 experts.

Usable context: ~32768
Recommended prompt format: Alpaca
Layers: 137

Quantization

llama.cpp imatrix.dat
exllamav2 measurement.json

Will upload a few quants when bandwidth permits.

Testing

Two different writing styles were considered for each testing scenario:

Completions for 3rd person narration. No character role was assumed.
Completions for 1st and 2nd person turn based (out-of-order) rp. A character role was assumed by the model, but narration of minor characters and events was encouraged.

Tests assumed a mature audience, but a range of scenarios were constructed.
Thematic inconsistancy or bias in character behaviour was penalized heavily.

Models showing the following were penalized during manual comparison:

Consistently short responses.
Laziness or readily gave up on solving a character problem.
Overly malleable, where characters could not hold opinions or beliefs.
Passiveness or an inability to drive the narrative.
Persistent repeats. Bad merges tend to latch onto and reuse specific keywords.
Ignoring or missing obvious scenario solutions.
Impersonating other major characters out of turn during rp tests.
Faliure to follow a character's description. This criteria is pretty broad, and could include things like character skills, refusals etc.
Major inconsistencies in scenes or recall. Note - invention of thematically consistant detail was encouraged.

Interesting observations from benchmarking

10 layer interleave stride with a 20 layer interleave width consistently outperformed alternative combinations for coherancy.
8 layer interleave stride with a 16 layer interleave width consistantly outperformed alternative combinations for creativity whilst remaining reasonably coherant.
Regular stride intervals are not optimal. In particular offsetting the first or last set of base models offets often improved metrics.
Goliath-120B is still a good standard for coherancy below 4096 context. A few miqu-1 merges are comparable, but testing found a small amount coherancy could be sacrificed for notable creativity improvements.