Primus - a trendmicro-ailab Collection

trendmicro-ailab 's Collections

updated Aug 9

(News) 70B Primus models: https://huggingface.co/collections/trendmicro-ailab/llama-primus-nemotron-70b-68066bf016241419a145a508

Upvote

Primus: A Pioneering Collection of Open-Source Datasets for Cybersecurity LLM Training

Paper • 2502.11191 • Published Feb 16 • 8

Note Start by reading the 🚀Primus Paper! To the best of our knowledge, we are the 🏄🏽‍♂️ first to release datasets covering cybersecurity pretraining, IFT, and reasoning distillation. Of course, we are also the first to pretrain an LLM on a large-scale cybersecurity corpus.
trendmicro-ailab/Llama-Primus-Base

Text Generation • 8B • Updated Mar 4 • 213 • 12

Note Based on Llama-3.1-8B-Instruct, continually pretrained on 2.77B tokens of cybersecurity text, achieving a 🚀15.88% improvement in the aggregated score across multiple cybersecurity benchmarks.
trendmicro-ailab/Llama-Primus-Merged

Text Generation • 8B • Updated Mar 4 • 811 • 13

Note Instruct Model! While maintaining nearly the same instruction-following capability as Llama-3.1-8B-Instruct, achieving a 🚀14.84% improvement across multiple cybersecurity benchmarks.
trendmicro-ailab/Llama-Primus-Reasoning

Text Generation • 8B • Updated Jun 2 • 466 • • 13

Note Distilled on reasoning and reflection data from o1-preview for cybersecurity tasks, achieving a 🚀10% improvement on CISSP.
trendmicro-ailab/Primus-Seed

Viewer • Updated Aug 8 • 174k • 232 • 17

Note Includes high-quality cybersecurity texts manually collected from reputable sources such as wikipedia, MITRE, cybersecurity company websites, CTI, and more.
trendmicro-ailab/Primus-FineWeb

Viewer • Updated Aug 9 • 3.39M • 92 • 17

Note Includes 2.57B tokens of cybersecurity texts filtered from FineWeb.
trendmicro-ailab/Primus-Instruct

Viewer • Updated Feb 20 • 835 • 213 • 6

Note Includes approximately 1K QA pairs covering common cybersecurity business scenarios.
trendmicro-ailab/Primus-Reasoning

Viewer • Updated Jun 2 • 4.89k • 167 • 12

Note Includes reasoning and reflection data generated by o1-preview on cybersecurity tasks for distillation.

Upvote