Is extreme context size version Mixtral available in the future?

#56
by StationaryWeaver - opened

🥳First, thanks and salute to everyone made Mixtral be possible to public!🥳

I am running models purely on CPU with highest possible resolution.
Fore me, Mixtral-8x22B-Instruct's behavior is much more understandable than Mixtral-8x7B-Instruct.🧐
However, it(Mixtral-8x22B-Instruct) "lost it's mind" after 20k+ context size.🤪

I believe there shall be some strategy on attention gaining to process previous context rather than slide-windowing the raw thing to the process. The strategy process itself can significantly reduce the data attention required.

I'm going to try Mistral Large 2 soon.🤗 Hope it better conscious.

Sign up or log in to comment