kaiokendev/superhot-30b-8k-no-rlhf-test

SuperHOT Prototype 2 w/ 8K Context

This is a second prototype of SuperHOT, a NSFW focused LoRA, this time 30B with 8K context and no RLHF, using the same technique described in the github blog. Tests have shown that the model does indeed leverage the extended context at 8K.

Looking for Merged & Quantized Models?

30B 4-bit CUDA: tmpupload/superhot-30b-8k-4bit-safetensors
30B 4-bit CUDA 128g: tmpupload/superhot-30b-8k-4bit-128g-safetensors

Using the monkey-patch?

You will NEED to apply the monkeypatch or, if you are already using the monkeypatch, change the scaling factor to 0.25 and the maximum sequence length to 8192

Using Oobabooga with Exllama?

python server.py --max_seq_len 8192 --compress_pos_emb 4 --loader exllama_hf

Training Details

I trained the LoRA with the following configuration:

1200 samples (~400 samples over 2048 sequence length)
learning rate of 3e-4
3 epochs
The exported modules are:
- q_proj
- k_proj
- v_proj
- o_proj
- no bias
Rank = 4
Alpha = 8
no dropout
weight decay of 0.1
AdamW beta1 of 0.9 and beta2 0.99, epsilon of 1e-5
Trained on 4-bit base model