Text Generation
Transformers
PyTorch
TensorBoard
Safetensors
bloom
Eval Results
text-generation-inference
Inference Endpoints

What hardware do I need to run an instance of Bloom?

#264
by natserrano - opened

I'm also having the same question (specifically for 176-B parameters). Anyone can answer?

As the model needs 352GB in bf16 (bfloat16) weights (176*2), the most efficient set-up is 8x80GB A100 GPUs. Also 2x8x40GB A100s or 2x8x48GB A6000 can be used. The main reason for using these GPUs is that at the time of this writing they provide the largest GPU memory, but other GPUs can be used as well. For example, 24x32GB V100s can be used.
Reference - https://github.com/huggingface/blog/blob/main/bloom-inference-pytorch-scripts.md

christopher changed discussion status to closed

Sign up or log in to comment