DISTILBERT 🌎 + Typo Detection βœβŒβœβœ”

distilbert-base-multilingual-cased fine-tuned on GitHub Typo Corpus for typo detection (using NER style)

Details of the downstream task (Typo detection as NER)

Metrics on test set πŸ“‹

Metric # score
F1 93.51
Precision 96.08
Recall 91.06

Model in action πŸ”¨

Fast usage with pipelines πŸ§ͺ

from transformers import pipeline

typo_checker = pipeline(
    "ner",
    model="mrm8488/distilbert-base-multi-cased-finetuned-typo-detection",
    tokenizer="mrm8488/distilbert-base-multi-cased-finetuned-typo-detection"
)

result = typo_checker("Adddd validation midelware")
result[1:-1]

# Output:
[{'entity': 'ok', 'score': 0.7128152847290039, 'word': 'add'},
 {'entity': 'typo', 'score': 0.5388424396514893, 'word': '##dd'},
 {'entity': 'ok', 'score': 0.94792640209198, 'word': 'validation'},
 {'entity': 'typo', 'score': 0.5839331746101379, 'word': 'mid'},
 {'entity': 'ok', 'score': 0.5195121765136719, 'word': '##el'},
 {'entity': 'ok', 'score': 0.7222476601600647, 'word': '##ware'}]

It worksπŸŽ‰! We typed wrong Add and middleware

Created by Manuel Romero/@mrm8488

Made with β™₯ in Spain

Downloads last month
33
Safetensors
Model size
135M params
Tensor type
F32
Β·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.