metadata

language:
  - code
datasets:
  - nuprl/EditPackFT
library_name: transformers
pipeline_tag: text2text-generation
tags:
  - code
model-index:
  - name: EditCoder-6.7b-v1
    results:
      - task:
          type: text-generation
        dataset:
          type: nuprl/CanItEdit
          name: CanItEdit Descriptive
        metrics:
          - name: pass@1
            type: pass@1
            value: 0.4815
            verified: false
      - task:
          type: text-generation
        dataset:
          type: nuprl/CanItEdit
          name: CanItEdit Lazy
        metrics:
          - name: pass@1
            type: pass@1
            value: 0.3696
            verified: false

EditCoder-6.7b (version 1) is a fine-tuned version of DeepSeek Coder (base model, 6.7b parameters) for instructional code editing. We utilize EditPackFT as our fine-tuning dataset, and we show state-of-the-art performance among non-distilled open source models for code editing, using the CanItEdit benchmark.

More information can be found on our paper.

Citation

If you use our work, please cite our paper as such:

@misc{cassano2023edit,
      title={Can It Edit? Evaluating the Ability of Large Language Models to Follow Code Editing Instructions}, 
      author={Federico Cassano and Luisa Li and Akul Sethi and Noah Shinn and Abby Brennan-Jones and Anton Lozhkov and Carolyn Jane Anderson and Arjun Guha},
      year={2023},
      eprint={2312.12450},
      archivePrefix={arXiv},
      primaryClass={cs.SE}
}

Prompt

The model has been trained on the following prompt format:

## Code Before:
{before}
## Instruction:
{instruction}
## Code After:
{after}

Here is a python function that can be used for formatting the prompt correctly:

def edit_prompt(old, instr):
    before = f"""## Code Before:\n{old}\n"""
    instr = f"""## Instruction:\n{instr}\n"""
    after = f"""## Code After:\n"""
    return before + instr + after

Train Your Own EditCoder

We provide the full pipeline that was used for training our own edit-coder model. The pipeline and instructions can be found on our GitHub repository.