FP16 version of the model and thus will reduce the download time and also significantly accelerate on the GPU execution time (e.g., 1.79 it/s vs 6.64 it/s on A770m). It also reduce the memory usage (RAM and VRAM).

Great work! Thanks for your contribution!

bes-dev changed pull request status to merged
Your need to confirm your account before you can post a new comment.

Sign up or log in to comment