Chaifighter-20B-v2 / README.md
matchaaaaa's picture
Update README.md
dcdc4aa verified
|
raw
history blame
6.29 kB
metadata
base_model:
  - Sao10K/Fimbulvetr-11B-v2
library_name: transformers
tags:
  - mergekit
  - merge
license: apache-2.0
pipeline_tag: text-generation

cute

Thank you @brooketh for the iMat + static GGUFs on the Faraday model hub!

Thank you @mradermacher for also making GGUFs and iMat GGUFs

Chaifighter 20B v2 (aaaaand it's BASICALLY a 20B this time!)

Meet Chaifighter 20B v2, my flagship Mistral 20B frankenmerge model! Boasting creativity, coherence, and cognitive thinking, this model is a great pick for those awkwardly stuck between 13B's and 34B's.

I also wanted to provide an alternative to Jeb Carter's Psyonic Cetacean 20B, which is a fantastic model that you should check out if you haven't already! The issue with that model is that it's based on Llama 2, which is outdated now. The older architecture lacked many performance enhancements that were introduced by the Mistral architecture, and on my 16 GB RTX 4060 Ti, those performance enhancements were the difference between decently speedy and intolerably sluggish.

Chaifighter 20B is geared towards long-form roleplay chats rather than short-form IRC/Discord RP chats. It loves verbosity and detail, and its quality will depend on how much "ammunition" you can give it. While it sorta-kinda can do short-form with some swiping, it isn't really ideal. But for those essay-writing powerhouses that love typing up a storm in the character card, this one's for you.

Chaifighter 20B natively supports a context window of only 4096 tokens maximum.

Prompt Template: Alpaca

Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{prompt}

### Response:

Recommended Settings: Universal-Light

Here are some settings ranges that tend to work for me. They aren't strict values, and there's a bit of leeway in them. Feel free to experiment a bit!

  • Temperature: 1.0 to 1.25 (adjust to taste, but keep it low. Chaifighter is creative enough on its own)
  • Min-P: 0.1 (increasing might help if it goes cuckoo, but I suggest keeping it there)
  • Repetition Penalty: 1.05 to 1.1 (high values aren't needed and usually degrade output)
  • Rep. Penalty Range: 256 or 512
  • (all other samplers disabled)

Merge Details

Mergekit

Chaifighter 20B is a frankenmerge of pre-trained language models created using mergekit.

Merge Method

This model was merged using the passthrough merge method.

Models Merged

The following models were included in the merge:

The Sauceeeeeee e ee

The following YAML configuration was used to produce this model:

slices:
  - sources:
    - model: Sao10K/Fimbulvetr-11B-v2
      layer_range: [0, 40]
  - sources:
    - model: SanjiWatsuki/Kunoichi-7B
      layer_range: [8, 16]
  - sources:
    - model: Mytho-Lemon-11B # my own merge (see below).
      layer_range: [8, 48]
merge_method: passthrough
dtype: bfloat16

And here's Mytho-Lemon-11B. Yep, named it backwards.

slices:
  - sources:
    - model: KatyTheCutie/LemonadeRP-4.5.3
      layer_range: [0, 24]
  - sources:
    - model: Gryphe/MythoMist-7B # manually added tokenizer files
      layer_range: [8, 32]
merge_method: passthrough
dtype: bfloat16

It's a lot better than v1 :skull:

So, the idea was to start with Fimbulvetr-11B-v2, a super solid RP model that punches wayyy above its weight especially for its coherence, reasoning, and even spatial awareness. Keeping the layers intact apparently is somewhat unusual, but I wanted to keep it closest to the input layers. I thought it would improve logic and open the door for more creativity later in the stack. I added Kunoichi next for its context and instruction following skills. This worked very well in v1. Lastly, I used a frankenmerge of MythoMist and LemonadeRP for the last layers. These are pretty creative models with solid writing. MythoMist in theory would give the model flavor and verbosity. LemonadeRP was recommended by a friend, and I thought it really complimented the rest of the mix quite nicely!

Thanks and Other Stuff

I want to thank everyone who helped me make this model. @brooketh, @FallenMerick, @jebcarter, @Qonsol, @PacmanIncarnate, and many others: thank you so much. Without the help, feedback, and encouragement these people gave, Chaifighter v2 would not have happened. The flaws in v1 were numerous and tricky to solve, especially for someone still super new to this (me). I don't know what I'd do without these kindhearted and generous people!

Yapping time. As far as the name is concerned, I'm going for a tea/coffee/hot drink motif for my models, and one of the names I was debating on using for this model was Chai-Latte. As I worked on this merge, I got the idea of naming it "Chaifighter" as a play on "Psyfighter2", one of the models making up Psyonic Cetacean and also a play on a model called "Tiefighter" from which it was derived. Both are fantastic models, especially given their age. They're both worth checking out too if you haven't done so. "Chai" itself is a play on a certain AI chatting website (CAI) that got me into this lovely mess in the first place. So I guess it's fitting to name the first model of the series after it.

And lastly, of course, thank you for checking out my model! Remember that you're super amazing, and have a fantastic day! :)