File size: 3,043 Bytes
608d434 fc51336 15cd0e2 fc51336 b83a16b fc51336 71c65ef 15cd0e2 6840889 051e153 15cd0e2 0adad45 12408c7 0adad45 130ff45 0adad45 15cd0e2 12408c7 130ff45 12408c7 15cd0e2 130ff45 15cd0e2 12408c7 130ff45 12408c7 15cd0e2 130ff45 15cd0e2 12408c7 7fc5167 12408c7 15cd0e2 7fc5167 15cd0e2 608d434 fc51336 fda6687 fc51336 8b4238c fc51336 15cd0e2 fe38445 15cd0e2 fc51336 b943618 fc51336 b943618 fc51336 b943618 fc51336 b943618 fc51336 b943618 11c4e7f fc51336 11c4e7f b943618 fc51336 45fcf9f fc51336 45fcf9f fc51336 45fcf9f fc51336 45fcf9f fc51336 45fcf9f fc51336 36fa1bd |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 |
---
license: apache-2.0
datasets:
- lambdasec/cve-single-line-fixes
- lambdasec/gh-top-1000-projects-vulns
language:
- code
tags:
- code
programming_language:
- Java
- JavaScript
- Python
inference: false
model-index:
- name: SantaFixer
results:
- task:
type: text-generation
dataset:
type: openai/human-eval-infilling
name: HumanEval
metrics:
- name: single-line infilling pass@1
type: pass@1
value: 0.47
verified: false
- name: single-line infilling pass@10
type: pass@10
value: 0.74
verified: false
- task:
type: text-generation
dataset:
type: lambdasec/gh-top-1000-projects-vulns
name: GH Top 1000 Projects Vulnerabilities
metrics:
- name: pass@1 (Java)
type: pass@1
value: 0.26
verified: false
- name: pass@10 (Java)
type: pass@10
value: 0.48
verified: false
- name: pass@1 (Python)
type: pass@1
value: 0.31
verified: false
- name: pass@10 (Python)
type: pass@10
value: 0.56
verified: false
- name: pass@1 (JavaScript)
type: pass@1
value: 0.36
verified: false
- name: pass@10 (JavaScript)
type: pass@10
value: 0.62
verified: false
---
# Model Card for SantaFixer
<!-- Provide a quick summary of what the model is/does. -->
This is a LLM for code that is focussed on generating bug fixes using infilling.
## Model Details
### Model Description
<!-- Provide a longer summary of what this model is. -->
- **Developed by:** [codelion](https://huggingface.co/codelion)
- **Model type:** GPT-2
- **Finetuned from model:** [bigcode/santacoder](https://huggingface.co/bigcode/santacoder)
## How to Get Started with the Model
Use the code below to get started with the model.
```python
# pip install -q transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
checkpoint = "lambdasec/santafixer"
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint,
trust_remote_code=True).to(device)
input_text = "<fim-prefix>def print_hello_world():\n
<fim-suffix>\n print('Hello world!')
<fim-middle>"
inputs = tokenizer.encode(input_text, return_tensors="pt").to(device)
outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0]))
```
## Training Details
- **GPU:** Tesla P100
- **Time:** ~5 hrs
### Training Data
The model was fine-tuned on the [CVE single line fixes dataset](https://huggingface.co/datasets/lambdasec/cve-single-line-fixes)
### Training Procedure
Supervised Fine Tuning (SFT)
#### Training Hyperparameters
- **optim:** adafactor
- **gradient_accumulation_steps:** 4
- **gradient_checkpointing:** true
- **fp16:** false
## Evaluation
The model was tested with the [GitHub top 1000 projects vulnerabilities dataset](https://huggingface.co/datasets/lambdasec/gh-top-1000-projects-vulns) |