Model size

#5
by mrfakename - opened

Hi,
Just curious why a 3B parameter model is as large as Mistral 7B? It’s ~15GB, seems strange to me.

2.7B times 4 (32 bit floats) = 10.8 GB

See config.json

  "torch_dtype": "float32",

Mistral 7B (and Llama-2 7B etc) are ~14 GB since they get distributed as 16 bit floats (-> parameter size times 2)

That makes sense. Thanks for the explanation!

mrfakename changed discussion status to closed

Sign up or log in to comment