Model size
#5
by
mrfakename
- opened
Hi,
Just curious why a 3B parameter model is as large as Mistral 7B? It’s ~15GB, seems strange to me.
2.7B times 4 (32 bit floats) = 10.8 GB
See config.json
"torch_dtype": "float32",
Mistral 7B (and Llama-2 7B etc) are ~14 GB since they get distributed as 16 bit floats (-> parameter size times 2)
That makes sense. Thanks for the explanation!
mrfakename
changed discussion status to
closed