Long Context Sucessor?

#17

by brucethemoose - opened Nov 3, 2023

Discussion

brucethemoose

Nov 3, 2023

•

edited Nov 3, 2023

Zephyr Alpha/Beta seems excellent, with a major exception. It doesn't handle long context as well as Amazon's MistraLite 32K model:

https://huggingface.co/amazon/MistralLite

Are there any plan's to adopt some of Amazon's tricks, such as the very large rope_theta, the 16K sliding window, and the 16K training? Whatever they did seems to work extremely well, better than other long context Llama finetunes/Loras I've tried.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment