inference: false
datasets:
- bigcode/commitpackft
model-index:
- name: patched-coder-34b
results:
- task:
type: text-generation
dataset:
type: openai_humaneval
name: HumanEval
metrics:
- name: pass@1
type: pass@1
value: 53.567
verified: false
- task:
type: text-generation
dataset:
type: bigcode/humanevalpack
name: HumanEvalFix Python
metrics:
- name: pass@1
type: pass@1
value: 41.341
verified: false
- task:
type: text-generation
dataset:
type: patched-codes/static-analysis-eval
name: Static Analysis Eval
metrics:
- name: pass@1
type: pass@1
value: 51.316
verified: false
Model Card for patched-coder-34b
This is an instruction fine-tuned model focussed on the task of patching code. Patching may include fixing bugs, remediating security vulnerabilities, doing API migrations and other kinds of code matainence.
Model Details
Model Description
- Developed by: codelion
- Model type: Code Llama
- Finetuned from model: CodeLlama-34b-Python
How to Get Started with the Model
Make sure to install Transformers from the main git branch:
pip install git+https://github.com/huggingface/transformers.git
How to Prompt the Model
This model accepts the alpaca instruction format.
For example:
### Instruction:
{instruction}
### Input:
{input}
### Response:
...
Bias, Risks, and Limitations
This model has undergone very limited testing. Additional safety testing should be performed before any real-world deployments.
Training Details
- GPU: A100 80 GB
- Time: ~8 hrs
Training Data
The model was fine-tuned on commitpackft, an open dataset consisting of commits.
We started with the commits for the python
langauge from the dataset and then filtered all the commits that were related to fixing bugs.
Training Procedure
Instruction fine-tuning to follow instructions in natural langauge related to code. We load the quantized base model in 4 bits and then use QLoRA for Parameter-Efficient Fine-Tuning (PEFT) with Flash Attention. The model was trained for 2 epochs.
Training Hyperparameters
Training regime:
The following bitsandbytes
quantization config was used during training:
- quant_method: bitsandbytes
- load_in_8bit: False
- load_in_4bit: True
- llm_int8_threshold: 6.0
- llm_int8_skip_modules: None
- llm_int8_enable_fp32_cpu_offload: False
- llm_int8_has_fp16_weight: False
- bnb_4bit_quant_type: nf4
- bnb_4bit_use_double_quant: True
- bnb_4bit_compute_dtype: bfloat16
Evaluation
We evaluate the model on HumanEval
and HumanEvalPack
benchmarks using
Code Generation LM Evaluation Harness.
We also evaluate the model for vulnerability remediation using the Static Analysis Eval
benchmark available here.