Long Context Sucessor?
#17
by
brucethemoose
- opened
Zephyr Alpha/Beta seems excellent, with a major exception. It doesn't handle long context as well as Amazon's MistraLite 32K model:
https://huggingface.co/amazon/MistralLite
Are there any plan's to adopt some of Amazon's tricks, such as the very large rope_theta
, the 16K sliding window, and the 16K training? Whatever they did seems to work extremely well, better than other long context Llama finetunes/Loras I've tried.