Using flash attention option
#6
by
lentan
- opened
config.json seems to say it's using torch attention, but switching it to flash attention says it's unimplemented with alibi.
Edit: sorry just use triton, it's in the readme!
lentan
changed discussion status to
closed
You beat me to it :)