This is the 25 MB compressed version of CodeBERT that has been fine-tuned for the Clone Detection task using BigCloneBench dataset.
The compression is based on our ASE 2022 paper named "Compressing Pre-trained Models of Code into 3 MB".
If you are interested in using this model, please check our GitHub repository: https://github.com/soarsmu/Compressor.git. If you use the model or any code from our repo in your paper, please kindly cite:
@inproceedings{shi2022compressing,
author = {Shi, Jieke and Yang, Zhou and Xu, Bowen and Kang, Hong Jin and Lo, David},
title = {Compressing Pre-Trained Models of Code into 3 MB},
year = {2023},
isbn = {9781450394758},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3551349.3556964},
doi = {10.1145/3551349.3556964},
booktitle = {Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering},
articleno = {24},
numpages = {12},
keywords = {Pre-Trained Models, Model Compression, Genetic Algorithm},
location = {Rochester, MI, USA},
series = {ASE '22}
}