File size: 2,678 Bytes

0ddbbaf
 
 
 
9b86097
0ddbbaf
 
 
 
 
9de2130
0ddbbaf
 
 
 
 
9de2130
0ddbbaf
 
 
75e633c
 
0ddbbaf
 
 
9de2130
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ba4dcdb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0ddbbaf
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
75e633c
 
 
 
 
 
 
 
0ddbbaf
 
 
 
 
 
 
9de2130

---
base_model: allenai/scibert_scivocab_uncased
tags:
- generated_from_trainer
- cybersecurity
metrics:
- accuracy
model-index:
- name: my_awesome_model
  results: []
pipeline_tag: text-classification
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# vuln-cat

This model is a fine-tuned version of [allenai/scibert_scivocab_uncased](https://huggingface.co/allenai/scibert_scivocab_uncased) on the None dataset.
It achieves the following results on the evaluation set:
- Loss: 0.5132
- Accuracy: 0.9034

## Model description

vuln-cat is a classification model based on fine-tuning of scibert. It categorizes CVE summaries into 11 types of vulnerabilities, with class labels including:
```
[
    'csrf',
    'directory_traversal',
    'file_inclusion', 
    'input_validation', 
    'memory_corruption', 
    'open_redirect', 
    'overflow', 
    'sql_injection', 
    'ssrf',
    'xss', 
    'xxe'
]
```

## Usage
```python
from transformers import pipeline

text = 'A path traversal exists in a specific dll of Trend Micro Mobile Security (Enterprise) 9.8 SP5 which could allow an authenticated remote attacker to delete arbitrary files.'

classifier = pipeline(
    "text-classification",
    model="conflick0/vuln-cat",
    padding=True,
    truncation=True,
    max_length=512,
)

classifier(text)
# [{'label': 'directory_traversal', 'score': 0.9969494938850403}]
```

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 8

### Training results

| Training Loss | Epoch | Step | Validation Loss | Accuracy |
|:-------------:|:-----:|:----:|:---------------:|:--------:|
| No log        | 1.0   | 88   | 0.3975          | 0.9006   |
| No log        | 2.0   | 176  | 0.3922          | 0.9034   |
| No log        | 3.0   | 264  | 0.4732          | 0.9034   |
| No log        | 4.0   | 352  | 0.5226          | 0.8949   |
| No log        | 5.0   | 440  | 0.4903          | 0.9034   |
| 0.0513        | 6.0   | 528  | 0.5203          | 0.9062   |
| 0.0513        | 7.0   | 616  | 0.5192          | 0.8949   |
| 0.0513        | 8.0   | 704  | 0.5132          | 0.9034   |


### Framework versions

- Transformers 4.38.2
- Pytorch 2.2.1+cu121
- Datasets 2.18.0
- Tokenizers 0.15.2