metadata
language:
- code
datasets:
- nuprl/EditPackFT
library_name: transformers
pipeline_tag: text2text-generation
tags:
- code
model-index:
- name: EditCoder-6.7b-v1
results:
- task:
type: text-generation
dataset:
type: nuprl/CanItEdit
name: CanItEdit Descriptive
metrics:
- name: pass@1
type: pass@1
value: 0.4815
verified: false
- task:
type: text-generation
dataset:
type: nuprl/CanItEdit
name: CanItEdit Lazy
metrics:
- name: pass@1
type: pass@1
value: 0.3696
verified: false
EditCoder-6.7b (version 1) is a fine-tuned version of DeepSeek Coder (base model, 6.7b parameters) for instructional code editing. We utilize EditPackFT as our fine-tuning dataset, and we show state-of-the-art performance among non-distilled open source models for code editing, using the CanItEdit benchmark.
More information can be found on our paper.
Citation
If you use our work, please cite our paper as such:
@misc{cassano2023edit,
title={Can It Edit? Evaluating the Ability of Large Language Models to Follow Code Editing Instructions},
author={Federico Cassano and Luisa Li and Akul Sethi and Noah Shinn and Abby Brennan-Jones and Anton Lozhkov and Carolyn Jane Anderson and Arjun Guha},
year={2023},
eprint={2312.12450},
archivePrefix={arXiv},
primaryClass={cs.SE}
}
Prompt
The model has been trained on the following prompt format:
## Code Before:
{before}
## Instruction:
{instruction}
## Code After:
{after}
Here is a python function that can be used for formatting the prompt correctly:
def edit_prompt(old, instr):
before = f"""## Code Before:\n{old}\n"""
instr = f"""## Instruction:\n{instr}\n"""
after = f"""## Code After:\n"""
return before + instr + after
Train Your Own EditCoder
We provide the full pipeline that was used for training our own edit-coder model. The pipeline and instructions can be found on our GitHub repository.