Llama-3 SillyTavern Presets Sharing

#5
by Lewdiculous - opened
AetherArchitectural org
edited May 12

This might be the place for Preset Sharing in this initial Llama-3 trying times.

I'll share my current recommendations so far:

Chaotic's simple presets here

And @Virt-io 's great set of presets here - recommended. He puts a lot of effort into these.

Lewdiculous pinned discussion

@Lewdiculous

Can you try my context and instruct templates? I think I've ironed out all the issues.


For anyone making their own templates.

Change chat start to something similar. This gets sent before the First message, functions as a separator of card info and chat.
context.png

And add an assistant prefix. This gets sent every generation, but does not stay in context. So if you use stat boxes or similar you may want to add the instructions for that in here, keeps it from being lost in context.
SET THE PREFIX TO BE SENT AS USER NOT AS SYSTEM
prefix.png

AetherArchitectural org

That looks pretty good mate. I'll give them a try next time I'm down bad..

Change chat start to something similar. This gets sent before the First message, functions as a separator of card info and chat.

If it works this will be amazing, i always went back to Solar because Llama3 models would be weird if you listed things a character liked in the card, they would just randomly start talking about those things in a normal conversation. Which always seemed like a fault of my settings and not the models themselves. I'm going to try the presets with [Llama-3SOME-8B-v1-BETA](https://huggingface.co/TheDrummer/Llama-3SOME-8B-v1-BETA
Edit - Llama-3Some-Beta misgenders user, going back to Llama3-Sovl resolves it & also i've noticed presets v1.5 make models write longer messages.

@saishf

The longer responses are intentional, if you want shorter responses edit Consistent Pacing and Creating a Scene from the Style Guidelines section.

If you want different formatting edit the Text Formatting Guidelines section.

If you run into weird issues with my templates please tell me.

Roleplay-v1.5 are the best presets I've used so far. I'm also using the dynamic temp samplers and this is the only combination that can successfully play a character like a zombie and remember that they aren't a normal human.
Eg:

{{chars}} voice is low and raspy, almost unintelligible due to the virus.

Big win :3

@saishf

The longer responses are intentional, if you want shorter responses edit Consistent Pacing and Creating a Scene from the Style Guidelines section.

If you want different formatting edit the Text Formatting Guidelines section.

If you run into weird issues with my templates please tell me.

The only issue I've had so far is random extra quotations and asterisks
Like these
Screenshot_20240429-012044.png

Screenshot_20240429-012248.png

It's not a deal breaker though, a regen fixes it.

Edit - I'm going to try removing the text formatting section to see if it helps

@saishf

The formatting issues are from [Dynamic-Temp]Roleplay, I am working on fixing these. At the moment some sliders are too high causing it to be dumb.

Also try a Q8_0 model, honestly it is not that slow if you're only using native 8K context.

AetherArchitectural org

Response formatting and quotation/asterisk placement in Llama-3s is their only deal breaker for me, their prose is fantastic for the size otherwise.

@saishf

The formatting issues are from [Dynamic-Temp]Roleplay, I am working on fixing these. At the moment some sliders are too high causing it to be dumb.

I'll switch to static samplers :3, I tried playing with the samplers to see if I could half fix it and my character started speaking Russian😺

Response formatting and quotation/asterisk placement in Llama-3s is their only deal breaker for me, their prose is fantastic for the size otherwise.

It managed to surpass Claude 3 Haiku on lmsys's english only leaderboard 😼

I tried older dynamic temp presets with llama3, I think it just hates dynamic temp entirely. None had any success fixing the formatting, switching to static presets fixed it instantly

@saishf

I have updated both samplers. Give them a try.


@Lewdiculous

I am not having any formatting issues, with llama3-8B-instruct. I am using Q8_0, might be the reason.

Screenshot_20240429-015825.png
Newest Dyna samplers.

Is it possible to rp with instruct? It's so smart 😿

Edit - static samplers are almost perfect, 9/10 responses are formatted correctly

AetherArchitectural org
edited Apr 28

As for models, they will have to run well enough at 4.5 BPW / Q4_K_M quants at least so most people can benefit from them. Won't rush, things will improve.

@saishf

I have fixed the repetition issue, it appears llama3 hates having two system prompts.
So, I changed the prefix to be sent as user.

@saishf

I have fixed the repetition issue, it appears llama3 hates having two system prompts.
So, I changed the prefix to be sent as user.

I'll be sure to try it tonight! Timezones 🥲

Quick testing, this took seven regens to get with the dynamic samplers, which is so much better.
Screenshot_20240430-030617.png
But the simple samplers cause repetition every second message 😭
New models 😿

@saishf

The file that I updated was the instruct preset.

If you are comfortable, can you send me the card you are using so I can test it?

@saishf

The file that I updated was the instruct preset.

If you are comfortable, can you send me the card you are using so I can test it?

I re-downloaded all four just to be sure and the model is even merged with a quote asterisk format lora @_@

https://files.catbox.moe/yw8hg1.png

Catbox since I'm unsure if hf strips metadata / character info :3

Edit - I don't know if the card just sucks, it's from somewhere like a year ago I just kept them there forever

Also the combination of Dyna temp + the presets are amazing
Screenshot_20240430-031355~2.jpg
≥ solar 😼

@saishf

I made some edits. I kind of feel bad now.

Using the simple samplers. Dynamic-temp still needs more work.
image.png


Card HERE

This is just personal preference, but I feel like small models get confused when using first person point of views. I just use third person for everything, even my own replies.

I made some edits. I kind of feel bad now.

No wonder glade is scared of everyone 😿

This is just personal preference, but I feel like small models get confused when using first person point of views. I just use third person for everything, even my own replies.

They do get confused a sometimes, but I never really care enough to worry. It's still fun and sometimes the results are crazy 😭
Plus I generally confuse the model with my english before it gets confused itself.
I just remember trying out small models like Pygmalion 6B ages ago, I'm glad local ai is good in comparison now.
If only there was a llama3 30B~ish then I'd have an excuse to buy a single 24gb gpu...
I'd like a T4 for the power consumption but they cost the same as a 3090 😕

Also have you tried running Phi-3? I found it uses 9gb of vram at 16K using Q5_K_M which is like a gigabyte more than llama3, a model twice the size.

@saishf

I tried Phi-3, but it was all over the place.


I did some changes to the presets.
Formatted the guidelines into a JSON like structure and moved them into the 'last assistant prefix'.
Only downside is that you have reprocess them every response, though they don't stay in context. So, maybe it is an upside?


Also added an example character. 😳

image.png

I tried Phi-3, but it was all over the place.

I did some changes to the presets.
Formatted the guidelines into a JSON like structure and moved them into the 'last assistant prefix'.
Only downside is that you have reprocess them every response, though they don't stay in context. So, maybe it is an upside?


Also added an example character. 😳

image.png

I hope phi 3 7b isn't as weird as mini and uses less vram. They do score crazy high in mmlu though.

It not really an issue as kobold ingests insanely fast at the beginning 1k+ T/s

And the instruct has made characters a little more aware? But it still has the cursed asterisk in narration bug

1000036215_x16.png

Rp in the Shakespearean 'ra :3

@saishf

I tried Phi-3, but it was all over the place.


I did some changes to the presets.
Formatted the guidelines into a JSON like structure and moved them into the 'last assistant prefix'.
Only downside is that you have reprocess them every response, though they don't stay in context. So, maybe it is an upside?


Also added an example character. 😳

image.png

Just downloaded Hermes 2 Pro llama3 and the random asterisk bug is still present in a model with ChatML. So it seems it's a llama3 thing, not a presets thing.

Ok, so all the weird stuff I was doing with the Last assistant prefix was a dud.

But including a message before the First message is working well.

Also me being blind, the chat separator should go in First Assistant Prefix :

image.png

Ok, so all the weird stuff I was doing with the Last assistant prefix was a dud.

But including a message before the First message is working well.

Also me being blind, the chat separator should go in First Assistant Prefix :

image.png

V1.7 definetly seems to behave better. I still encounter llama3 formatting errors, but characters seem to not try "fairytale" end scenes as much as they used to

@saishf

Are you using the version I uploaded a little while ago? I still left it at v1.7, just updated the files.

I went and tried to remove anything that would steer cards to behave in a certain way. It should respect cards a lot more now.


I am going to focus on samplers now.

@saishf

Are you using the version I uploaded a little while ago? I still left it at v1.7, just updated the files.

I went and tried to remove anything that would steer cards to behave in a certain way. It should respect cards a lot more now.


I am going to focus on samplers now.

I'll try the new ones, the ones I have are the original 1.7's

And the samplers seem to be the root cause of a lot of the formatting issues I think, messing around with the templates in ST I found llama3's formatting would literally implode with some.

You can try the test samplers, they work decently.

You can try the test samplers, they work decently.

New samplers are much better, they'll commit torture while simple roleplay samplers wont.
They will also actually state where something came from instead of saying a character magically pulled a massive item out of thin air, or magically conjured fire etc.
Testing with soliloquy V2 and the latest presets :3
Edit - maybe a little too torturous, bordering on murderous. Called a character an idiot and they yeeted me into a wall. Figured that might be an issue for some.

@saishf @Lewdiculous

v1.9 | 'Narrative Technique' was causing weird issues, removed it.

Also I have been corrupted :)

image.png

AetherArchitectural org
edited May 12

This is still Vanilla. When you say "I can't share my examples", that's when you know you're doing well.
Let it all in...

This is still Vanilla. When you say "I can't share my examples", that's when you know you're doing well.
Let it all in...

The struggles of reviewing models but none of your examples are shareable 😭

What are you people doing to the AI? (ಠ_ಠ)

What are you people doing to the AI? (ಠ_ಠ)

Making use of the hard work being done with OAS 😸(no point wasting it :3)

AetherArchitectural org

Science isn't pretty.

AetherArchitectural org

Just so more people are aware:

https://huggingface.co/Virt-io/SillyTavern-Presets


Closing for now.

Lewdiculous changed discussion status to closed

Sign up or log in to comment