"Mega Model"?
#1
by
Enderchef
- opened
A cool idea would be to take all the datasets you have, and finetune qwen3 0.6B on them all at once for a Mega Model
Hello @Enderchef ,
Training an LLM on that many different datasets won't result in a good model because they have the same question pool. It would only have a useful outcome if the different datasets would cover specialized topics.
As an alternative would you want a distill based on one of my datasets based on Qwen 0.6 B ?