--- language: - it tags: - Biomedical Language Modeling widget: - text: >- L'asma allergica รจ una patologia dell'[MASK] respiratorio causata dalla presenza di allergeni responsabili dell'infiammazione dell'albero bronchiale. example_title: Example 1 - text: >- Il pancreas produce diversi [MASK] molto importanti tra i quali l'insulina e il glucagone. example_title: Example 2 - text: >- Il GABA รจ un amminoacido ed รจ il principale neurotrasmettitore inibitorio del [MASK]. example_title: Example 3 datasets: - IVN-RIN/BioBERT_Italian --- ๐Ÿค— + ๐Ÿ“š๐Ÿฉบ๐Ÿ‡ฎ๐Ÿ‡น + ๐Ÿ“–๐Ÿง‘โ€โš•๏ธ + ๐ŸŒโš•๏ธ = **MedBIT-r3-plus** From this repository you can download the **MedBIT-r3-plus** (Medical Bert for ITalian) checkpoint. **MedBIT-r3-plus** is built on top of [BioBIT](https://huggingface.co/IVN-RIN/bioBIT), further pretrained on a corpus of medical textbooks, either directly written by Italian authors or translated by human professional translators, used in formal medical doctorsโ€™ education and specialized training. The size of this corpus amounts to 100 MB of data. These comprehensive collections of medical concepts can impact the encoding of biomedical knowledge in language models, with the advantage of being natively available in Italian, and not being translated. Online healthcare information dissemination is another source of biomedical texts that is commonly available in many less-resourced languages. Therefore, we also gathered an additional 100 MB of web-crawled data from reliable Italian, health-related websites. More details in the paper. **MedBIT-r3-plus** has been evaluated on 3 downstream tasks: **NER** (Named Entity Recognition), extractive **QA** (Question Answering), **RE** (Relation Extraction). Here are the results, summarized: - NER: - [BC2GM](http://refhub.elsevier.com/S1532-0464(23)00152-1/sb32) = 81.87% - [BC4CHEMD](http://refhub.elsevier.com/S1532-0464(23)00152-1/sb35) = 80.68% - [BC5CDR(CDR)](http://refhub.elsevier.com/S1532-0464(23)00152-1/sb31) = 81.97% - [BC5CDR(DNER)](http://refhub.elsevier.com/S1532-0464(23)00152-1/sb31) = 76.32% - [NCBI_DISEASE](http://refhub.elsevier.com/S1532-0464(23)00152-1/sb33) = 63.36% - [SPECIES-800](http://refhub.elsevier.com/S1532-0464(23)00152-1/sb34) = 63.90% - QA: - [BioASQ 4b](http://refhub.elsevier.com/S1532-0464(23)00152-1/sb30) = 68.21% - [BioASQ 5b](http://refhub.elsevier.com/S1532-0464(23)00152-1/sb30) = 77.89% - [BioASQ 6b](http://refhub.elsevier.com/S1532-0464(23)00152-1/sb30) = 75.28% - RE: - [CHEMPROT](http://refhub.elsevier.com/S1532-0464(23)00152-1/sb36) = 38.82% - [BioRED](http://refhub.elsevier.com/S1532-0464(23)00152-1/sb37) = 67.62% [Check the full paper](https://www.sciencedirect.com/science/article/pii/S1532046423001521) for further details, and feel free to contact us if you have some inquiry!