Transfer learning?

#3
by Zemulax - opened

Hi there, I just wanted to know if you pretrained this model without using gpt or any other model as a boost. like from literal scratch were you did not load any pretraained checkpoint. I need help.
Thanks

Hi, @Zemulax !

Yes, it was trained from scratch, without using any other model.
I used specifically this command line, listed under "Creating a model on the fly", on Transformers examples:
image.png

You can also read more about the making of this model here:
The making of Minueza-32M: Transformer model trained from scratch

I read your incredible story. Its similar to what I want to achieve.
However, I have 5billion tokens at my fingertips that I want to utilise. I am struggling with lr. How do I set the learning rate, which lr is suitable for my situation. I have done research but still cannot come to a draw. Please help

Ah, the learning rate...
I believe each dataset has its own unique LR sweet spot.
Before actually starting training the model, I suggest doing a warmup training (using only 10K samples from your dataset) with 4 different LRs and then checking which one provided the best responses. Then you'll have at least a better starting point.
The first four LRs that I try on are: 5e-5, 5e-6, 8e-7, 2e-4.

Thank you Victor, and oh,how much did it cost you to pretrain, what GPUS did you use and cloud provider.

I trained Minueza-32M all locally, on a Macbook M1. It took some weeks and I thought I'd have an increase in the electricity bill, but in the end, I didn't notice any difference, so I'd say there were no costs.

wow awesome. Thank you broh, this has been helpful. I am taking it a step further by pretraining something similar to GPT1 or 2-small. Its quite a journey I must say

Felladrin changed discussion status to closed

Sign up or log in to comment