Update to set use_cache: True which can boost inference performance a fair bit

#1
by TheBloke - opened
No description provided.

I believe it gets automatically set like that since it was trained with gradient checkpointing so happy to revert this so it's easier to use for decoding

zpn changed pull request status to merged

Sign up or log in to comment