BERT LLVM IR Vulnerability Detection
This model is a BERT-based classifier trained to detect vulnerable vs non-vulnerable program slices represented in LLVM IR.
Model
Base model: bert-base-uncased
Task: Binary classification
Labels
0 โ Non-vulnerable
1 โ Vulnerable
Repository
Training and preprocessing code:
https://github.com/forwardtech/bert-llvm-ir-vulnerability-detection
Dataset
This model was trained on data derived from previously published vulnerability detection research datasets.
This repository does not redistribute the dataset. Please obtain the dataset from the original sources referenced below and follow their licensing terms.
References
Original Dataset
Zhen Li, Deqing Zou, Shouhuai Xu, Zhaoxuan Chen, Yawei Zhu, and Hai Jin.
VulDeeLocator: A Deep Learning-based Fine-grained Vulnerability Detector.
IEEE Transactions on Dependable and Secure Computing.
@article{vuldeelocator,
title={VulDeeLocator: A Deep Learning-based Fine-grained Vulnerability Detector},
author={Li, Zhen and Zou, Deqing and Xu, Shouhuai and Chen, Zhaoxuan and Zhu, Yawei and Jin, Hai},
journal={IEEE Transactions on Dependable and Secure Computing},
year={2021}
}
Follow-up Research
Andrew Arash Mahyari.
Harnessing the Power of LLMs in Source Code Vulnerability Detection.
@article{mahyari_llm_vuln_detection,
title={Harnessing the Power of LLMs in Source Code Vulnerability Detection},
author={Mahyari, Andrew Arash},
journal={IEEE},
year={2023}
}
Intended Use
This model is intended for research and experimentation in vulnerability detection using LLVM IR.
It should not be used as a standalone production vulnerability scanner.
Usage
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model_name = "chidamtek/bert-llvm-ir-vulnerability-detection"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
code = "LLVM IR snippet here"
inputs = tokenizer(code, return_tensors="pt", truncation=True)
outputs = model(**inputs)
prediction = torch.argmax(outputs.logits, dim=-1).item()
print(prediction)
- Downloads last month
- 45