There's some kind of gguf tokenization issue where its not properly handling the <|im_end|> EOS token and instead breaks it into multiple smaller ones which degrades performance for gguf users. Need to submit an issue ticket. Unsure if this impacts other quantization formats.
If possible, please use the safetensors or Exllama2 for best performance. Exllama2 tokenizes correctly.
Thank you to Lucyknada for Exl2 Quants https://huggingface.co/lucyknada/cgato_Nemo-12b-Humanize-KTO-v0.1-exl2
This is the first public release of the Humanize model I've been working on. That way folks have a stable repo to download from and can expect some consistient level of performance when downloading.
Goals
The goal of this model is to make something that feels different from other models out there. I wanted to accomplish this in two ways, first use as little synthetic data as possible, and second, by hand judging KTO samples to avoid common AI phrases and repetition. It has seen very very little synthetic data and without RL is unusable. The RL dataset was built by pulling responses from the model itself and then having a human (me) review each response. The KTO dataset for this model consisted of 7122 human judged responses. Of the 7122 responses, 2398 were accepted and 4724 were rejected. To balance this, the KTO finetune was run using a 2.5x weight for desirable responses.
One result of using so much human data is that this model does not call itself ChatGPT or Claude. Examples below.
Chat Format: ChatML with System Prompt
Logits taken for the above input with samplers disabled.
Some sample completions taken with the Sampler settings listed at the end of the document.
Note that if you tell it that it's an AI and ask it for its name, it will call itself Claude/GPT. I did not tune to encourage or remove this behavior.
Issues
- GGUF currently does not tokenize the end token correctly, resulting in lower performance for GGUF users.
- Male characters are not thoroughly represented in the RL dataset.
- Additional work needs to be done polishing assistant answers and performance.
- Model sometimes writes for the user during the first few responses, will probably be solved with more data. For now just regenerate.
All samples taken at bfloat16, Transformers with FA2.
Examples Roleplay
Example Chat
Sampler Settings
The above examples were taken from the model using the below sampler settings. Feel free to experiment.
- Temp: 0.85
- TopK: 40
- TopP: 0.9
- RepPen: 1.1
For Roleplay, use ChatML with System prompt, you may append the character name to the start messages if needed.
Formatting Settings for Silly Tavern. It should end up as ChatML with username/character names.
- Downloads last month
- 192