Phenomenon of saturation not reached?
#6
by
DrNicefellow
- opened
As studying the phenomenon of saturation is one purpose of training TinyLlama, and the saturation seems not reached with the 3T tokens. Do you think it's reasonable to give it further training until saturation? If doing so, careful choices on learning rate could be important.