anthracite-org/magnum-v2-123b · Consider a mistral-7b-instruct-v0.3 to use as an exl2 draft model with this?

Aug 19

So Mistral-Large an mistral-7b-instruct-v0.3 share the same vocab, which allows us exl2 users to use the 7b as a 'draft model', increasing tokens/second from like 13 -> 20 when the 7b predicts accurately.

But with the Lumimaid finetune, it obviously misses more often, and no doubt it'll be the same for this magnum.

It'd be nice to have a magnum finetune of the 7b mistral so that it predicts more accurately as a draft model for this one :)

lucyknada

Anthracite org Aug 19

interesting use-case! I've added it internally for future reference, but in our testing draft models seemed to affect text-performance too much, even if it was faster.

lucyknada changed discussion status to closed Aug 19

jukofyork

Aug 20

•

edited Aug 20

interesting use-case! I've added it internally for future reference, but in our testing draft models seemed to affect text-performance too much, even if it was faster.

It shouldn't actually effect the text generation though:

https://arxiv.org/abs/2211.17192

we can make exact decoding from the large models faster, by running them in parallel on the outputs of the approximation models, potentially generating several tokens concurrently, and without changing the distribution.

The larger model only uses the sequence generated by the smallest model up until the location where the max probability token diverges, so the output should be identical?

gghfez

Aug 21

That's right, it doesn't affect it (at least the current exllamav2 version) provided the vocab matches. I've seen people have issues with perplexity when they use the wrong draft model.

lucyknada

Anthracite org Aug 21

no worries, the past testing won't affect if we try it out or not (already in queue), we simply are looking into other things first that were on the backburner for a while.