Loading the model
#3
by
PyrroAiakid
- opened
I'm running into issues loading this model too. Gotta love our super helpful community, right?
Why context length is 2048? Is it was cut on half? Base Llama 2 model have 4096 context length. If it's indeed 2048 then it's not the first time the model gets massacred like that.
It's just the GQA which should be 8 in 70B models. IF you multiply 1024 by 8 it will be 8192. Try adding -gqa 8 parameter or set gqa as 8.