Text Generation
Transformers
PyTorch
llama
text-generation-inference
Inference Endpoints

Hardware spec to train 70b model

#6
by cnut1648 - opened

Hello, nice work!
I wonder if you can disclose some of the hardware spec to train the model. Currently I am experimenting with using training 70B model but have no success on 8x A100 80G gpus (gets out of memory error), even with bf16 + LoRA + deepspeed Zero 3 Offload + FlashAttention.
Thanks!

WizardLM Team org

8x A100 80G gpus is enough for the 70b training.

Hi @WizardLM thanks for the reply. Will the training details be released, in paper or in high-level? I am pretty curious about training 70B size model. Are you using deepspeed zero 3 offload or is there other acceleration method?

Yeah agree with @cnut1648 ! having the training config will be very helpful !

Sign up or log in to comment