Instruct-v0.1 or Instruct-v0.2 based?
#4
by
zappa2005
- opened
I couldn't find this information in the model page, hence the question: Did you base on v0.1 or v0.2?
I read somewhere that the v0.1 uses a context extension method SWA (sliding-window attention) which was not working well in some cases, so they changed that in v0.2.
I guess I read it here, but I'm not sure - it was late. Thank you!
https://www.reddit.com/r/LocalLLaMA/comments/18k0fek/psa_you_can_and_may_want_to_disable_mixtrals/
what? this is Mixtral-8x7B-Instruct-v0.1, not Mistral-7B-Instruct-v0.2, this is a mixtral model, there is no 0.2 instruct for mixtral
IkariDev
changed discussion status to
closed