BERT LLVM IR Vulnerability Detection

This model is a BERT-based classifier trained to detect vulnerable vs non-vulnerable program slices represented in LLVM IR.

Model

Base model: bert-base-uncased

Task: Binary classification

Labels

0 → Non-vulnerable
1 → Vulnerable

Repository

Training and preprocessing code: https://github.com/forwardtech/bert-llvm-ir-vulnerability-detection

Dataset

This model was trained on data derived from previously published vulnerability detection research datasets.

This repository does not redistribute the dataset. Please obtain the dataset from the original sources referenced below and follow their licensing terms.

References

Original Dataset

Zhen Li, Deqing Zou, Shouhuai Xu, Zhaoxuan Chen, Yawei Zhu, and Hai Jin.
VulDeeLocator: A Deep Learning-based Fine-grained Vulnerability Detector.
IEEE Transactions on Dependable and Secure Computing.

@article{vuldeelocator,
  title={VulDeeLocator: A Deep Learning-based Fine-grained Vulnerability Detector},
  author={Li, Zhen and Zou, Deqing and Xu, Shouhuai and Chen, Zhaoxuan and Zhu, Yawei and Jin, Hai},
  journal={IEEE Transactions on Dependable and Secure Computing},
  year={2021}
}

Follow-up Research

Andrew Arash Mahyari.
Harnessing the Power of LLMs in Source Code Vulnerability Detection.

@article{mahyari_llm_vuln_detection,
  title={Harnessing the Power of LLMs in Source Code Vulnerability Detection},
  author={Mahyari, Andrew Arash},
  journal={IEEE},
  year={2023}
}

Intended Use

This model is intended for research and experimentation in vulnerability detection using LLVM IR.

It should not be used as a standalone production vulnerability scanner.

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_name = "chidamtek/bert-llvm-ir-vulnerability-detection"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

code = "LLVM IR snippet here"

inputs = tokenizer(code, return_tensors="pt", truncation=True)
outputs = model(**inputs)

prediction = torch.argmax(outputs.logits, dim=-1).item()
print(prediction)

Downloads last month: 45

Safetensors

Model size

0.1B params

Tensor type

F32