KenLM trained on PubMed

Tells to what extent a text looks like it comes from a scientific paper

Dataset

Trained on Scientific papers dataset

Installation

!git clone https://huggingface.co/marianna13/kenlm-pubmed
!pip install https://github.com/kpu/kenlm/archive/master.zip -q

Usage

Compute KenLM score:

import kenlm
model = kenlm.LanguageModel('kenlm-pubmed/pubmed.binary')
text = 'The 2019 novel coronavirus (COVID-19) is a newly emerged strain that has never been found in humans before. At present, the laboratory-based reverse transcription-polymerase chain reaction (RT-PCR) is the main method to confirm COVID-19 infection.'
print(model.score(text.lower())) # -84.2962646484375

The less score - the better:

text = 'Kudligi is a panchayat town in Vijayanagara district in the India state of Karnataka'
print(model.score(text.lower())) # -44.775997161865234

Corrupted text:

text = 'Comptition with thool-containiRg'
print(model.score(text.lower())) # -9.675569534301758
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support