Notes

This is an experiment to try these extra SLERP parameters @bamec66557 uses in bamec66557/Qwen-2.5-14B-MINUS, but with the models I'm working on now. Do they make a difference to mergekit-gui? We'll see.

Models Merged

The following models were included in the merge:

Configuration

The following YAML configuration was used to produce this model:

models:
  - model: sometimesanotion/Qwen2.5-14B-Vimarckoso-v3
  - model: sometimesanotion/Lamarck-14B-v0.6
merge_method: slerp
base_model: sometimesanotion/Qwen2.5-14B-Vimarckoso-v3
dtype: float32
out_dtype: bfloat16
parameters:
  t: [0.3, 0.6, 0.8, 0.6, 0.3]
  
regularization:
  - method: gradient_penalty
    scale: 0.07
  - method: weight_clipping
    clip_range: [-0.2, 0.2]
  - method: random_noise
    scale: 0.005
  - method: attention_dropout
    scale: 0.03

postprocessing:
  - operation: entropy_regularization
    scale: 0.07
  - operation: non_linear_scaling
    parameters:
      function: gelu
  - operation: sharpening
    intensity: 0.7
  - operation: gaussian_smoothing
    sigma: 0.2
  - operation: normalize
  - operation: dynamic_scaling
    scale_range: [0.97, 1.03]
  - operation: smoothing
    parameters:
      adaptive: true
      range: [0.97, 1.03]
      kernel_size: 5

sometimesanotion
/

Qwen2.5-14B-MinusLike-Slerp-Experimental

Notes

Models Merged

Configuration

Model tree for sometimesanotion/Qwen2.5-14B-MinusLike-Slerp-Experimental