Can you distill qwen-2.5-72b?
#30
by
xldistance
- opened
Can you distill qwen-2.5-72b?
S/he meant of course distilling DeepSeek R1 using Qwen2.5 72B as the student model (if you still have some GPUs left for that:)
(We only have the smaller-sized Qwen2.5 32B used in that role, but not 72B, which may explain why that distillation using Qwen 32B was beaten on many benchmarks by the distillation using the 100% larger 70B Llama 3.x as the student model)