m2mKD
This repository contains the checkpoints for m2mKD: Module-to-Module Knowledge Distillation for Modular Transformers.
Released checkpoints
For the usage of the checkpoints listed below, please refer to the instructions provided on our GitHub repo.
nac_scale_tinyimnet.pth
/nac_scale_imnet.pth
: NAC model with a scale-free prior trained using m2mKD.vmoe_base.pth
: V-MoE-Base model trained using m2mKD.FT_huge
: a directory containing DeiT-Huge teacher modules for NAC model training.nac_tinyimnet_students
: a directory containing NAC student modules for Tiny-ImageNet.
Acknowledgement
Our implementation is mainly based on Deep-Incubation.
Citation
If you use the checkpoints, please cite our paper:
@misc{lo2024m2mkd,
title={m2mKD: Module-to-Module Knowledge Distillation for Modular Transformers},
author={Ka Man Lo and Yiming Liang and Wenyu Du and Yuantao Fan and Zili Wang and Wenhao Huang and Lei Ma and Jie Fu},
year={2024},
eprint={2402.16918},
archivePrefix={arXiv},
primaryClass={cs.LG}
}