PolymerNER

This model is a fine-tuned version of the MaterialsBERT model on a dataset of 638 abstracts and contains a linear layer on top of MaterialsBERT to predict the entity type of each token. The entity types predicted by this model are POLYMER, POLYMER_FAMILY, ORGANIC, INORGANIC, MONOMER, PROP_NAME, PROP_VALUE, MATERIAL_AMOUNT. This named entity recognition (NER) model was introduced in this paper. Refer to the paper for a more detailed description of the entity types and performance metrics of the model. As MaterialsBERT is uncased, the NER model is also uncased.

Intended uses & limitations

You can use the model for sequence labeling/entity tagging tasks on materials science text. The training, validation and test data for the model consisted of abstracts related to polymers. The entities tagged by the model however are general and can be used with any materials science text to tag the entity types defined in the ontology of the model.

How to Use

Here is how to use the model to tag entities given some text:

from transformers import AutoModelForTokenClassification, AutoTokenizer, pipeline
tokenizer = AutoTokenizer.from_pretrained('pranav-s/PolymerNER', model_max_length=512)
model = AutoModelForTokenClassification.from_pretrained('pranav-s/PolymerNER')
ner_pipeline = pipeline(task="ner", model=model, tokenizer=tokenizer, aggregation_strategy="simple", device='cpu')
text = "Polyethylene has a glass transition temperature of -100 °C"
ner_output = ner_pipeline(text)

Training data

A training data set of 638 polymer abstracts was used. The data set is provided here

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 5

Framework versions

Transformers 4.17.0
Pytorch 1.10.2
Datasets 1.18.3
Tokenizers 0.11.0

Citation

If you find PolymerNER useful in your research, please cite the following paper:

@article{materialsbert,
  title={A general-purpose material property data extraction pipeline from large polymer corpora using natural language processing},
  author={Shetty, Pranav and Rajan, Arunkumar Chitteth and Kuenneth, Chris and Gupta, Sonakshi and Panchumarti, Lakshmi Prerana and Holm, Lauren and Zhang, Chao and Ramprasad, Rampi},
  journal={npj Computational Materials},
  volume={9},
  number={1},
  pages={52},
  year={2023},
  publisher={Nature Publishing Group UK London}
}