codelion commited on
Commit
93590d0
1 Parent(s): 768fb2e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +44 -35
README.md CHANGED
@@ -3,42 +3,42 @@ inference: false
3
  datasets:
4
  - bigcode/commitpackft
5
  model-index:
6
- - name: patched-coder-34b
7
- results:
8
- - task:
9
- type: text-generation
10
- dataset:
11
- type: openai_humaneval
12
- name: HumanEval
13
- metrics:
14
- - name: pass@1
15
- type: pass@1
16
- value: 53.567
17
- verified: false
18
- - task:
19
- type: text-generation
20
- dataset:
21
- type: bigcode/humanevalpack
22
- name: HumanEvalFix Python
23
- metrics:
24
- - name: pass@1
25
- type: pass@1
26
- value: 41.341
27
- verified: false
28
- - task:
29
- type: text-generation
30
- dataset:
31
- type: patched-codes/static-analysis-eval
32
- name: Static Analysis Eval
33
- metrics:
34
- - name: pass@1
35
- type: pass@1
36
- value: 51.316
37
- verified: false
 
38
  ---
39
  # Model Card for patched-coder-34b
40
 
41
-
42
  This is an instruction fine-tuned model focussed on the task of patching code. Patching may include fixing bugs, remediating security vulnerabilities,
43
  doing API migrations and other kinds of code matainence.
44
 
@@ -113,9 +113,18 @@ The following `bitsandbytes` quantization config was used during training:
113
 
114
  ## Evaluation
115
 
116
- We evaluate the model on `HumanEval` and `HumanEvalPack` benchmarks using
117
  [Code Generation LM Evaluation Harness](https://github.com/bigcode-project/bigcode-evaluation-harness).
118
 
119
- We also evaluate the model for vulnerability remediation using the `Static Analysis Eval` benchmark available [here](https://huggingface.co/datasets/patched-codes/static-analysis-eval).
120
 
121
  ### Results
 
 
 
 
 
 
 
 
 
 
3
  datasets:
4
  - bigcode/commitpackft
5
  model-index:
6
+ - name: patched-coder-34b
7
+ results:
8
+ - task:
9
+ type: text-generation
10
+ dataset:
11
+ type: openai_humaneval
12
+ name: HumanEval
13
+ metrics:
14
+ - name: pass@1
15
+ type: pass@1
16
+ value: 53.567
17
+ verified: false
18
+ - task:
19
+ type: text-generation
20
+ dataset:
21
+ type: bigcode/humanevalpack
22
+ name: HumanEvalFix Python
23
+ metrics:
24
+ - name: pass@1
25
+ type: pass@1
26
+ value: 41.341
27
+ verified: false
28
+ - task:
29
+ type: text-generation
30
+ dataset:
31
+ type: patched-codes/static-analysis-eval
32
+ name: Static Analysis Eval
33
+ metrics:
34
+ - name: pass@1
35
+ type: pass@1
36
+ value: 51.316
37
+ verified: false
38
+ license: llama2
39
  ---
40
  # Model Card for patched-coder-34b
41
 
 
42
  This is an instruction fine-tuned model focussed on the task of patching code. Patching may include fixing bugs, remediating security vulnerabilities,
43
  doing API migrations and other kinds of code matainence.
44
 
 
113
 
114
  ## Evaluation
115
 
116
+ We evaluated the model on `HumanEval` (for code generation) and `HumanEvalFix Python` (for bug fixing) benchmarks using
117
  [Code Generation LM Evaluation Harness](https://github.com/bigcode-project/bigcode-evaluation-harness).
118
 
119
+ To evaluate the model for vulnerability remediation we used the `Static Analysis Eval` benchmark available [here](https://huggingface.co/datasets/patched-codes/static-analysis-eval).
120
 
121
  ### Results
122
+
123
+ | Model | HumanEval | HumanEval Fix Python| Static Analysis Eval |
124
+ | ----- | ----------| ------------------- | -------------------- |
125
+ | GPT-4 | 86.6 | 47 | 55.26 |
126
+ | patched-coder-34b | 53.57 | 41.34 | 51.32 |
127
+ | CodeLlama-34b-Python | 53.29 | 33.14 | 27.63 |
128
+
129
+ Based on the results on these benchmarks, patched-coder-34b is the SOTA open code LLM. Other code LLMs (e.g. from WizardCoder and Phind) are trained on
130
+ either unknown proprietary datasets or used OpenAI's APIs for training, thus making them unviable for commercial use.