r1 fine tunning

#1
by shivamb250 - opened

what if fine tunning is done on r1 used for this model ?

what if fine tunning is done on r1 used for this model ?

I think you would have to be more specific. Using the same datasets on the DeepSeek-R1 to Qwen2.5 32B R1 distillation? Or some kind of offline logit distillation?

I'm not sure training on the 32b R1 would be worth it -- it'd likely catastrophically forget too hard to do the fancy CoT, only benefit would be a different flavor of prose. Could be interesting but I'm not sure it's worth spending

@Kearm @Fizzarolli on the 671b model. probably either v3 or r1.

Would love to get the money for that one lol

@Fizzarolli would love to do it from 17th feb to 30 feb. if it takes that much time to tune . probs use it to distill smaller 104b model like command r + maybe.

Sign up or log in to comment