Transfer learning?
Hi there, I just wanted to know if you pretrained this model without using gpt or any other model as a boost. like from literal scratch were you did not load any pretraained checkpoint. I need help.
Thanks
Hi, @Zemulax !
Yes, it was trained from scratch, without using any other model.
I used specifically this command line, listed under "Creating a model on the fly", on Transformers examples:
You can also read more about the making of this model here:
The making of Minueza-32M: Transformer model trained from scratch
I read your incredible story. Its similar to what I want to achieve.
However, I have 5billion tokens at my fingertips that I want to utilise. I am struggling with lr. How do I set the learning rate, which lr is suitable for my situation. I have done research but still cannot come to a draw. Please help
Ah, the learning rate...
I believe each dataset has its own unique LR sweet spot.
Before actually starting training the model, I suggest doing a warmup training (using only 10K samples from your dataset) with 4 different LRs and then checking which one provided the best responses. Then you'll have at least a better starting point.
The first four LRs that I try on are: 5e-5
, 5e-6
, 8e-7
, 2e-4
.
Thank you Victor, and oh,how much did it cost you to pretrain, what GPUS did you use and cloud provider.
I trained Minueza-32M all locally, on a Macbook M1. It took some weeks and I thought I'd have an increase in the electricity bill, but in the end, I didn't notice any difference, so I'd say there were no costs.
wow awesome. Thank you broh, this has been helpful. I am taking it a step further by pretraining something similar to GPT1 or 2-small. Its quite a journey I must say