FP16 vs FP32

#127
by Taylor658 - opened

What are the memory usage, performance differences, and accuracy trade-offs between FP16 and FP32 precision in Whisper-large-v3 on typical GPU like the NVIDIA A100?

You can get a rough idea of the memory usage to run any model using this formula

Approx memory usage = No of parameters * byte precision * 0.1

In theory, the memory would be a bit higher (sequence length, loading libraries etc)

When we say FP16, this equates to 2 bytes per parameter, Whisper Large v3 has ~1.6B params.

Therefore the total memory usage for params would be over 3.2GB.

Thanks for the feedback and formula

Taylor658 changed discussion status to closed
Your need to confirm your account before you can post a new comment.

Sign up or log in to comment