license: cc-by-nc-4.0
base_model:
- NousResearch/Hermes-3-Llama-3.1-70B
- Sao10K/L3.1-70B-Hanami-x1
library_name: transformers
tags:
- mergekit
- merge
Hanames-90B-L3.1
It's a stack merge meme model made from Hermes 3 and Hanami-x1. Uses a similar formula to my previous stack merge, but updates to Hanami-x1 and includes some mild slerping of the slices. Coherence seems to be improved as a result while remaining fun to use. You should use it for roleplay and creative writing AND PROBABLY NOTHING ELSE (but hey, it's your funeral)
STACK MERGE DISCLAIMER
yes it's just a stack merge, no I didn't do any additional pretraining, no stack merges don't make the model smarter, yes they harm its ability to do complex logical tasks, yes they introduce some weird behaviors and unexpected mistakes, no they don't make the model sentient, no you shouldn't post on twitter about how adding a few layers turned it into agi, etc. etc.
That said, it does feel unique and fun to use. If you're the type of person who's drowning in VRAM would rather have some more variety at the expense of needing to make a few manual edits to clean up mistakes, give it a try.
Format
ChatML
Samplers
Because stack merges introduce some unexpected noise to the model, I recommend higher min p than normal. I've been getting good results with min_p 0.09-0.11 -> temp 0.8-1.0, add your favorite anti-repetition sampler as needed.
Configuration
The following YAML configuration was used to produce this model:
slices:
- sources:
- model: ../Hermes-3-Llama-3.1-70B
layer_range: [0, 21] # 21
- model: ../L3.1-70B-Hanami-x1
layer_range: [0, 21]
parameters:
t: 0.2
- sources:
- model: ../L3.1-70B-Hanami-x1
layer_range: [16, 36] # 20
- model: ../Hermes-3-Llama-3.1-70B
layer_range: [16, 36]
parameters:
t: 0.8
- sources:
- model: ../Hermes-3-Llama-3.1-70B
layer_range: [30, 50] # 20
- model: ../L3.1-70B-Hanami-x1
layer_range: [30, 50]
parameters:
t: 0.2
- sources:
- model: ../L3.1-70B-Hanami-x1
layer_range: [40, 64] # 24
- model: ../Hermes-3-Llama-3.1-70B
layer_range: [40, 64]
parameters:
t: 0.8
- sources:
- model: ../Hermes-3-Llama-3.1-70B
layer_range: [60, 80] # 20
- model: ../L3.1-70B-Hanami-x1
layer_range: [60, 80]
parameters:
t: 0.2
merge_method: slerp
base_model: ../Hermes-3-Llama-3.1-70B
idtype: bfloat16
tokenizer_source: ../Hermes-3-Llama-3.1-70B
In the first few iterations I tried merging the tokenizers in an attempt to support both ChatML and L3, but it ended up breaking both of them. Also tried lower and higher slerp ratios but this seems like the sweet spot.
All credit goes to the original finetuners, I'm just some dummy who can write mergekit configs.
:*