KenLM trained on PubMed
Tells to what extent a text looks like it comes from a scientific paper
Dataset
Trained on Scientific papers dataset
Installation
!git clone https://huggingface.co/marianna13/kenlm-pubmed
!pip install https://github.com/kpu/kenlm/archive/master.zip -q
Usage
Compute KenLM score:
import kenlm
model = kenlm.LanguageModel('kenlm-pubmed/pubmed.binary')
text = 'The 2019 novel coronavirus (COVID-19) is a newly emerged strain that has never been found in humans before. At present, the laboratory-based reverse transcription-polymerase chain reaction (RT-PCR) is the main method to confirm COVID-19 infection.'
print(model.score(text.lower())) # -84.2962646484375
The less score - the better:
text = 'Kudligi is a panchayat town in Vijayanagara district in the India state of Karnataka'
print(model.score(text.lower())) # -44.775997161865234
Corrupted text:
text = 'Comptition with thool-containiRg'
print(model.score(text.lower())) # -9.675569534301758
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support