Context length is not 128k
#41
by
pseudotensor
- opened
vllm uses default of 8k, and can't make it use 128k.
you can .. just change the config.json
but 128k would take over 130g vram alone .. i can only fit 64 in 96g
As I argue in that vLLM thread. I don't think that's how it should be done. Shouldn't just change embedding size, since rope scaling is used. It should be part of the calculation.