Not good results --

by Sirfrummel - opened 10 days ago

10 days ago

Hello, I have been using ollama to run this. I first tried 8_0 and then I went to 6_K MAX.
I tried both high temperature 1.5 and lower .7, and the recommended rep penalty of 1.15.

I have been using Claude to help analyze local LLM models and output, and we have two benchmark prompts that Claude came up with to test both instruction following and nuance in output and direction following.

So for example, the recent test prompt was:
"Write exactly two paragraphs describing a young botanist examining a rare night-blooming flower. The botanist is excited about the darkness because it's necessary for the bloom, but simultaneously anxious because they recently discovered signs that someone has been stealing specimens from the greenhouse. Show this contrast between their scientific enthusiasm and personal fear without explicitly stating either emotion."

In all my tests, the MOE DarkMultiverse model would go beyond two paragraphs and also be very verbose / purple prose. Other models I'm running (like the EVA models, and R1) had no problem limiting to two paragraphs and scoring much better on output.

It looks like there are only 2 experts being used by default, and I wasn't sure how to increase that in ollama -- which I was hoping that might lead to better results. Any recommendations please let me know.

DavidAU

Owner 9 days ago

Hey

thank you for writing, and your feedback.
This model is based on older "Mistral 7B" models, which do not have the same level of instruction following at a newer model types.
I suggest you try the same prompt in one of my Gemma the Writer models, Dark Planet Series, Dark Universe etc etc, as these are newer model
types - > Gemma2, Llama 3 (and 3.1 for Spinfire), and Mistral Nemo.

I could not locate the info on how to change # of experts in Olama.
There was an issue with MOEs about 9 months ago, and they fell "out of favor" due to changes in Llamacpp / quants - a model breaking change.
This may be in part why MOE support/usage dropped off about that time.

You might also want to try any of the Llama 3.2 models - both "regular" and "moes" I have built too -> Llama 3.2 is a lot stronger than Llama 3, and 3.1 in terms
of instruction following.

Hopefully this helps;

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment