File size: 1,454 Bytes
8f2f84b 5c21ed5 8f2f84b d270f93 8f2f84b d270f93 24a4e43 8f2f84b e759dfb |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
# MatSciBERT
## A Materials Domain Language Model for Text Mining and Information Extraction
This is the pretrained model presented in [MatSciBERT: A materials domain language model for text mining and information extraction](https://rdcu.be/cMAp5), which is a BERT model trained on material science research papers.
The training corpus comprises papers related to the broad category of materials: alloys, glasses, metallic glasses, cement and concrete. We have utilised the abstracts and full text of papers(when available). All the research papers have been downloaded from [ScienceDirect](https://www.sciencedirect.com/) using the [Elsevier API](https://dev.elsevier.com/). The detailed methodology is given in the paper.
The codes for pretraining and finetuning on downstream tasks are shared on [GitHub](https://github.com/m3rg-repo/MatSciBERT).
If you find this useful in your research, please consider citing:
```
@article{gupta_matscibert_2022,
title = "{MatSciBERT}: A Materials Domain Language Model for Text Mining and Information Extraction",
author = "Gupta, Tanishq and
Zaki, Mohd and
Krishnan, N. M. Anoop and
Mausam",
year = "2022",
month = may,
journal = "npj Computational Materials",
volume = "8",
number = "1",
pages = "102",
issn = "2057-3960",
url = "https://www.nature.com/articles/s41524-022-00784-w",
doi = "10.1038/s41524-022-00784-w"
}
``` |