Rope scaling implementation
#11
by
cvdbdo
- opened
Do you plan on implementing rope scaling in the near future?
(In transformers such as
model = AutoModelForCausalLM.from_pretrained(
"mistralai/Mistral-7B-Instruct-v0.1",
rope_scaling={"type": "dynamic", "factor": 2.},
device_map='auto'
)
)
but don't they use sliding window attention mechanism for larger contexts?
That's not planned in the near future!
lerela
changed discussion status to
closed