What was the cost of the training?
#12
by
ssoltani
- opened
you mentioned for the training you used 8XA100 80G. I was wondering what was the total time of training and what was the cost?
DeepSpeed-ZERO-2 + QLoRA + Flash Attention2, with a batch size of 16, 8xA100(80G) takes 5-7 hours for training, each card takes up 70G VRAM. The training time depends on your machine set-up and may vary a lot.