Compact Biomedical Models
Collection
This collection contains the models from the "On the Effectiveness of Compact Biomedical Transformers"
•
7 items
•
Updated
TinyBioBERT is a distilled version of the BioBERT which is distilled for 100k training steps using a total batch size of 192 on the PubMed dataset.
This model uses a unique distillation method called ‘transformer-layer distillation’ which is applied on each layer of the student to align the attention maps and the hidden states of the student with those of the teacher.
This model uses 4 hidden layers with a hidden dimension size and an embedding size of 768 resulting in a total of 15M parameters. Due to the model's small hidden dimension size, it uses random initialisation.
If you use this model, please consider citing the following paper:
@article{rohanian2023effectiveness,
title={On the effectiveness of compact biomedical transformers},
author={Rohanian, Omid and Nouriborji, Mohammadmahdi and Kouchaki, Samaneh and Clifton, David A},
journal={Bioinformatics},
volume={39},
number={3},
pages={btad103},
year={2023},
publisher={Oxford University Press}
}