Anvilogic - Where AI Meets Cybersecurity
Welcome to the official Hugging Face organization for Anvilogic's advanced cybersecurity AI models!
Founded in 2019, Anvilogic specializes in AI-driven threat detection and automation, enhancing Security Operations Center (SOC) capabilities with scalable, data-driven solutions.
Typosquatting Collection
Typosquatting is a form of cyber attack where malicious actors create fake domain names that are visually or phonetically similar to legitimate domains, intending to deceive users into visiting these sites. This collection aims to detect typosquatted domains by identifying and flagging them. It is comprised of the following:
Models
- Embedder: This model provides a representation for domain names and is used to mine similar domains.
- Cross-Encoder: This model can compare two domain names and determine if one domain is a typosquat of another.
- T5 Typosquat Detection: This model is a derived version of T5 trained on a new task, with the prefix "Is the first domain a typosquat of the second:" to which we append TYPOSQUAT_DOMAIN and LEGITIMATE_DOMAIN.
Datasets
- Embedder Training Dataset: A dataset formatted to train the embedding model, containing pairs of (Anchor,Positive) domain examples.
- Cross-Encoder Training Dataset: A dataset formatted to train the Cross-Encoder model with (Anchor,Positive,label) samples.
- T5 Training Dataset: A dataset formatted to train the T5 model with (prompt,response) pairs.
Spaces
- Embedder Typosquat Detect: Allows users to retrieve the most similar domains from a pool of 4,000 of the most common domains.
- Cross-Encoder (CE) Typosquat Detect: Allows users to compare two domains using the Cross-Encoder. The model outputs a probability of typosquatting.
- T5 Typosquat Detect: Allows users to compare two domains using the T5 model. The model outputs a boolean value indicating whether the domain is a typosquat.