Description
After I put down the joint and RTFM, I have a better idea exactly what's going on. I considered doing something similar with WANDA or SparseGPT a while back, but stopped when I ran into issues. Thus, I'm fascinated by this new method's execution.
Hypothesis
By lowering the density, I hit closer to the sweet-spot shown in the paper. Also, I'm using my fixed base model, so hopefully that helps too. Weights are adjusted to make the later layers more aligned with ORCA 2.
Results
I'm quite happy with this model for what it is, a personable and effective assistant. It does infodump a bit, but what genius doesn't? It writes okay erotica and general fiction, it just has an "artifical" tone.
Recipe
merge_method: dare_ties
base_model: athirdpath/BigLlama-20b
model: athirdpath/CleverGirl-20b
weight: 0.60 / density: 0.35
model: athirdpath/CleverGirl-20b-Inverted
weight: 0.40 / density: 0.30
int8_mask: true
dtype: bfloat16