| | --- |
| | license: mit |
| | base_model: |
| | - Qwen/Qwen2.5-Coder-7B-Instruct |
| | --- |
| | |
| |
|
| | `CodeRankLLM` is a 7B LLM fine-tuned for listwise code-reranking. When combined with performant code retrievers like [`CodeRankEmbed`](https://huggingface.co/cornstack/CodeRankEmbed), it significantly enhances the quality of retrieved results for various code retrieval tasks. |
| |
|
| | We release the scripts to evaluate our model's performance [here](https://github.com/gangiswag/cornstack). |
| |
|
| |
|
| | ## Training |
| |
|
| | Our code reranker is based on LLM-based listwise reranking, which has gained prominence for the ability to score multiple passages simultaneously. Training data for listwise reranking was generated by selecting 50,000 <query, positive, negatives> tuples from our high-quality dataset [CoRNStack](https://gangiswag.github.io/cornstack/), filtered to ensure higher similarity scores and better ranks for the positives. Since CoRNStack doesn't contain the ranked ordering data required for training listwise rerankers, we leverage [Qwen-2.5-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) LLM provided ranked orderings for each example to serve as ranking supervision. We initialize our reranker with [Qwen2.5-Coder-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct) and fine-tune using a language modeling objective that minimizes the prediction error of the next token in the sequence. |
| |
|
| | # Citation |
| |
|
| | If you find the model, dataset, or training code useful, please cite our work: |
| |
|
| | ```bibtex |
| | @misc{suresh2025cornstackhighqualitycontrastivedata, |
| | title={CoRNStack: High-Quality Contrastive Data for Better Code Retrieval and Reranking}, |
| | author={Tarun Suresh and Revanth Gangi Reddy and Yifei Xu and Zach Nussbaum and Andriy Mulyar and Brandon Duderstadt and Heng Ji}, |
| | year={2025}, |
| | eprint={2412.01007}, |
| | archivePrefix={arXiv}, |
| | primaryClass={cs.CL}, |
| | url={https://arxiv.org/abs/2412.01007}, |
| | } |
| | ``` |