--- license: cc-by-nc-4.0 language: - en tags: - cybersecurity widget: - text: >- Native API functions such as may be directly invoked via system calls (syscalls). However, these features are also commonly exposed to user-mode applications through interfaces and libraries. example_title: Native API functions - text: >- One way to explicitly assign the PPID of a new process is through the API call, which includes a parameter for defining the PPID. example_title: Assigning the PPID of a new process - text: >- Enable Safe DLL Search Mode to ensure that system DLLs in more restricted directories (e.g., %%) are prioritized over DLLs in less secure locations such as a user’s home directory. example_title: Enable Safe DLL Search Mode - text: >- GuLoader is a file downloader that has been active since at least December 2019. It has been used to distribute a variety of , including NETWIRE, Agent Tesla, NanoCore, and FormBook. example_title: GuLoader is a file downloader --- # SecureBERT+ **SecureBERT+** is an enhanced version of [SecureBERT](https://huggingface.co/ehsanaghaei/SecureBERT), trained on a corpus **five times larger** than its predecessor and leveraging the computational power of **8×A100 GPUs**. This model delivers an **average 6% improvement** in Masked Language Modeling (MLM) performance compared to SecureBERT, representing a significant advancement in language understanding and representation within the cybersecurity domain. --- ## Dataset SecureBERT+ was trained on a large-scale corpus of cybersecurity-related text, substantially expanding the coverage and depth of the original SecureBERT training data. ![dataset](https://cdn-uploads.huggingface.co/production/uploads/6340b0bd77fd972573eb2f9b/pO-v6961YI1D0IPcm0027.png) --- ## Using SecureBERT+ SecureBERT+ is available on the [Hugging Face Hub](https://huggingface.co/ehsanaghaei/SecureBERT_Plus). ### Load the Model ```python from transformers import RobertaTokenizer, RobertaModel import torch tokenizer = RobertaTokenizer.from_pretrained("ehsanaghaei/SecureBERT_Plus") model = RobertaModel.from_pretrained("ehsanaghaei/SecureBERT_Plus") inputs = tokenizer("This is SecureBERT Plus!", return_tensors="pt") outputs = model(**inputs) last_hidden_states = outputs.last_hidden_state ``` # Masked Language Modeling Example Use the code below to predict masked words in text: ```python #!pip install transformers torch tokenizers import torch import transformers from transformers import RobertaTokenizerFast tokenizer = RobertaTokenizerFast.from_pretrained("ehsanaghaei/SecureBERT_Plus") model = transformers.RobertaForMaskedLM.from_pretrained("ehsanaghaei/SecureBERT_Plus") def predict_mask(sent, tokenizer, model, topk=10, print_results=True): token_ids = tokenizer.encode(sent, return_tensors='pt') masked_pos = (token_ids.squeeze() == tokenizer.mask_token_id).nonzero().tolist() words = [] with torch.no_grad(): output = model(token_ids) for pos in masked_pos: logits = output.logits[0, pos] top_tokens = torch.topk(logits, k=topk).indices predictions = [tokenizer.decode(i).strip().replace(" ", "") for i in top_tokens] words.append(predictions) if print_results: print(f"Mask Predictions: {predictions}") return words ``` # Limitations & Risks Domain-Specific Scope: SecureBERT+ is optimized for cybersecurity text and may not generalize as well to unrelated domains. Bias in Training Data: The training corpus was collected from online sources and may contain biases, outdated knowledge, or inaccuracies. Potential Misuse: While designed for defensive research, the model could be misapplied to generate adversarial content or obfuscate malicious behavior. Resource-Intensive: The larger dataset and model training process require significant compute resources, which may limit reproducibility for smaller research teams. Evolving Threats: The cybersecurity landscape evolves rapidly. Without regular retraining, the model may not capture emerging threats or terminology. Users should apply SecureBERT+ responsibly, with appropriate oversight from cybersecurity professionals. # Reference ``` @inproceedings{aghaei2023securebert, title={SecureBERT: A Domain-Specific Language Model for Cybersecurity}, author={Aghaei, Ehsan and Niu, Xi and Shadid, Waseem and Al-Shaer, Ehab}, booktitle={Security and Privacy in Communication Networks: 18th EAI International Conference, SecureComm 2022, Virtual Event, October 2022, Proceedings}, pages={39--56}, year={2023}, organization={Springer} } ```