Is extreme context size version Mixtral available in the future?
#56
by
StationaryWeaver
- opened
🥳First, thanks and salute to everyone made Mixtral be possible to public!🥳
I am running models purely on CPU with highest possible resolution.
Fore me, Mixtral-8x22B-Instruct's behavior is much more understandable than Mixtral-8x7B-Instruct.🧐
However, it(Mixtral-8x22B-Instruct) "lost it's mind" after 20k+ context size.🤪
I believe there shall be some strategy on attention gaining to process previous context rather than slide-windowing the raw thing to the process. The strategy process itself can significantly reduce the data attention required.
I'm going to try Mistral Large 2 soon.🤗 Hope it better conscious.