Training time
#13
by
sirus
- opened
your model card says
Model
Architecture: For architecture detail, see the blog post.
Pretraining steps: 600k
Pretraining tokens: 600B
Precision: bfloat16
Tokenizer: HuggingFaceTB/cosmo2-tokenizer
Hardware
GPUs: 64 H100
for SmolLM-135M how long did it take to train to 600k steps with 64 H100s? trying to get an idea of how much it would cost for me to do training runs where I change small bits of the architecture
would really love an answer here. I need a platform for testing out various ideas and I have no idea if this is within my budget. are we talking 10s of thousands or hundreds of thousands of dollars or even more?
Hey, it took around 1 day, i think it can be optimize by increasing the batch size even more :)
thank you so much!
sirus
changed discussion status to
closed