Broken model

#1
by dillfrescott - opened

This model starts out okay and eventually divulges into insanity or repeating the same phrase over and over no matter the settings :(

If 9 copper ingots make a copper block, and 9 blocks make a x1 compressed copper block, with 9 x1 copper blocks making a x2 copper block and so on... How many ingots would be in a x9 copper block?

Let's denote the base units as "unit ingot" and powers of compression as subscript numbers like this:

  • unit ingot = 1 ingot
  • x1 ingot = 9 ingots (since 9 ingots make a block)
  • x2 ingot = 8 cubed * unit ingot = 5 cubed * ingots per block^3 = ** cubes ingots**
  • x3 ingot = 8 cubed * x2 ingot = (**)^ cubed * ingots per block^ cubed = ^ cubed * ingots per block^ cubed ingots
  • x4 ingot = 8 cubed * x3 ingot = (^ cubed)^ cubed * ingots per block^ cub cubed ingots
  • ...
  • xn ingot = 8 cubed *xn-1 ingot = (**^ cubed)^ cubed^(n-1) * ingots per block^ cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cubcub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cubcub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cubcubcub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cubcub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cubcub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cubcub cub cub cub cub cub cub cub cub cubcub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cubcub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cubcub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cubcub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cub cu

I was informed that it may have something to do with incorrect default rope values. But I am not positive.

Thank you for trying the model and for your feedback! I will investigate further!
May I ask what are you using to infere it?

It seems an eos problem, can you share your setup configuration?
You should have both and <|endoftext|> as eos tokens

Unable to run this until its quantized for obvious reasons, but I had similar issues with other yi models until I modified the rope parameters. Kobold (and presumably llama) CPP expect 4k context by default, so you need to modify rope settings for 200k. "--ropeconfig 1.0 5000000" worked well for me. Maybe give that a shot.

Thank you for your suggestion, anyway we think this is a Capybara problem, since it has been trained with a missing eos_token and during merging it had messed up the conf.
We'll be merging a new model without Capy, stay tuned

Ah sorry I didnt get to this sooner. I was using llamacpp!

eos-token problems are fixed with the new versions, try them out!

Difference between SG and GS?

They are just merged with swapped order, I made them both to test which one would be better :)

mlinmg changed discussion status to closed

Sign up or log in to comment