Model size

by mrfakename - opened Dec 13, 2023

Dec 13, 2023

Hi,
Just curious why a 3B parameter model is as large as Mistral 7B? It’s ~15GB, seems strange to me.

okket

Dec 13, 2023

•

2.7B times 4 (32 bit floats) = 10.8 GB

See config.json

  "torch_dtype": "float32",

Mistral 7B (and Llama-2 7B etc) are ~14 GB since they get distributed as 16 bit floats (-> parameter size times 2)

Dec 13, 2023

That makes sense. Thanks for the explanation!

mrfakename changed discussion status to closed Dec 13, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment