Model information:
This model is the distilbert-base-uncased model that has been finetuned using radiology report texts from the MIMIC-III database. The task performed was text classification in order to benchmark this model with a selection of other variants of BERT for the classifcation of MIMIC-III radiology report texts into two classes. Labels of [0,1] were assigned to radiology reports in MIMIC-III that were linked to an ICD9 diagnosis code for lung cancer = 1 and a random sample of reports which were not linked to any type of cancer diagnosis code at all = 0.
Intended uses:
This model is intended to be used to classify texts to identify the presence of lung cancer. The model will predict lables of [0,1].
Limitations:
Note that the dataset and model may not be fully represetative or suitable for all needs it is recommended that the paper for the dataset and the base model card should be reviewed before use -
How to use:
Load the model from the library using the following checkpoints:
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("sarahmiller137/distilbert-base-uncased-ft-m3-lc")
model = AutoModel.from_pretrained("sarahmiller137/distilbert-base-uncased-ft-m3-lc")
- Downloads last month
- 10