jiekeshi/CodeBERT-25MB-Clone-Detection

This is the 25 MB compressed version of CodeBERT that has been fine-tuned for the Clone Detection task using BigCloneBench dataset.

The compression is based on our ASE 2022 paper named "Compressing Pre-trained Models of Code into 3 MB".

If you are interested in using this model, please check our GitHub repository: https://github.com/soarsmu/Compressor.git. If you use the model or any code from our repo in your paper, please kindly cite:

@inproceedings{shi2022compressing,
  author = {Shi, Jieke and Yang, Zhou and Xu, Bowen and Kang, Hong Jin and Lo, David},
  title = {Compressing Pre-Trained Models of Code into 3 MB},
  year = {2023},
  isbn = {9781450394758},
  publisher = {Association for Computing Machinery},
  address = {New York, NY, USA},
  url = {https://doi.org/10.1145/3551349.3556964},
  doi = {10.1145/3551349.3556964},
  booktitle = {Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering},
  articleno = {24},
  numpages = {12},
  keywords = {Pre-Trained Models, Model Compression, Genetic Algorithm},
  location = {Rochester, MI, USA},
  series = {ASE '22}
}