Llama 3 coping mechanisms - Part 3

#6
by Lewdiculous - opened
LWDCLS Research org

It's like an optional DLC for now.

This is a direct Part 3 continuation of Part 2 in this thread.

Merging in all the loras now, when finished it should be ready for quant, but I'll give it a test first.

LWDCLS Research org

Should be able to get to it before bed, we got a few hours to go. No rush.

https://huggingface.co/ResplendentAI/Aura_Uncensored_l3_8B Here is the model, I still need to download and test format adherence, but if it's anything like the preliminary model it should be exactly what a lot of people are looking for.

@Lewdiculous I have tested your format and it is working perfectly on first reply. My work has not been in vain. Open the floodgates, kind sir, and allow the trial by fire to commence...

Dolphin-2.9-llama3-8b announced to be on its way soon. Don't see it on HF yet, but I would expect it by early next week at the latest.

I did a test frankenmerge of base to find out how big a SOLAR DUS might be, and the result was 11.52B parameters. I'd expect that creating a DUS from the Dolphin version would require less post-merge training to remediate as decensoring would already have been performed prior to frankenmerging rather than after.

LWDCLS Research org
β€’
edited Apr 21

ChatML, my hated beloved! Removed biases... Music to the ears.

@Virt-io

Phi-3-Mini manages to get this question right more often

The phis are impressive little buds, but do you notice how nonsensical the reasoning is? "A pound of steel weighs slightly less than 1 pound" :d No, a pound weighs exactly 1 pound and that should be an easy thing to "get right" at the end of the day, but sadly isn't. It also makes no sense to say "neither weighs the same". They either weigh the same, or don't, this description makes it sound like they can weigh the same on their own in isolation, which is totally illogical because then there is nothing they are compared against. Unless it was trying to insinuate that a pound of steel vs a pound of feathers would not weigh the same (when the units are equal), which would be a mistake. So I would say this is a very bad answer if we are looking for signs of real reasoning capabilities. Phi-3 should be a bit more capable here, though, as it was heavily trained on academic material and is meant to do good specifically with math and academic questions.

As for flash attention - I had very good results with the LMS update last night - up to 45 t/s on Llama-3 hermes pro Q6_K @ 2K context, and 27 t/s @ 8K. Kobold does about 21 t/s with the same model @ 8K and . Certainly a very welcomed improvement <3

also this :D

image.png

I did notice it was going rather insane, i really hope the 7b brings better reasoning. Using Phi 3 Mini it felt kind of unstable or inconsistent. One minute it's a genius next second it's lobotomized. And it uses more vram. It's kind of just. Useless locally? Might be useful if you have an a100 lying around.

@saishf I'm hoping for that as well, but in a sense it feels like phi-3 mini does so well because it was essentially trained on all of the stuff people usually benchmark for. At the end of the day, the reasoning is limited and falls down on the training. Such small models are very useful if you are heavily limited on resources, though, but it's just really hard to rely on them, and hard to rely even on the biggest ones like gpt4 with full confidence.

Maybe it is because it was trained on structured data?

Causing it to loose its mind when you give it randomly concatenated data.

Ollama makes me sad. Really nice QOL feautures but doesn't expose all of llamacpp's features.


It should work in WSL2.

I haven't run bench marks, but I assume it would be slightly faster.(For me it feels faster, I am using Arch though. Newer libs)

Since it would be using your systems dependencies.

Arch isn't natively supported in WSL2 :<
Only have Ubuntus, Debian, Kali, Oracle, OpenSuse, Suse Enterprise & OpenSuse-Tumbleweed.

I only know how to use Ubuntu though πŸ˜Άβ€πŸŒ«οΈ

You should be able to build on ubuntu.

I can try and make a little guide if you're interested.

Also you can make an Arch wsl with https://github.com/yuk7/ArchWSL (not recommended)

BTW there is supposed to be a way to setup arch in wsl: https://wsldl-pg.github.io/ArchW-docs/How-to-Setup/

But I've never tried this. You may also use arch via docker and enable all of the virtualization stuff there

Lewdiculous unpinned discussion
LWDCLS Research org
β€’
edited May 3
Lewdiculous changed discussion status to closed

@saishf I'm hoping for that as well, but in a sense it feels like phi-3 mini does so well because it was essentially trained on all of the stuff people usually benchmark for. At the end of the day, the reasoning is limited and falls down on the training. Such small models are very useful if you are heavily limited on resources, though, but it's just really hard to rely on them, and hard to rely even on the biggest ones like gpt4 with full confidence.

I don't think it'll be possible to fully rely on a model for a while. Even longer locally. I do wish it would come sooner. I'd personally like to see a humaneval test where a model like gpt4 is put against university professors in answering complex multi level questions. Having students pick which is answer preferable.
The benchmarks we currently have are just, flawed?

Sign up or log in to comment