astroBERT: a language model for astrophysics
This public repository contains the work of the NASA/ADS on building an NLP language model tailored to astrophysics, along with tutorials and miscellaneous related files.
This model is cased (it treats ads
and ADS
differently).
astroBERT models
- Base model: Pretrained model on English language using a masked language modeling (MLM) and next sentence prediction (NSP) objective. It was introduced in this paper at ADASS 2021 and made public at ADASS 2022.
- NER-DEAL model: This model adds a token classification head to the base model finetuned on the DEAL@WIESP2022 named entity recognition task. Must be loaded from the
revision='NER-DEAL'
branch (see tutorial 2). - SciX Categorizer: This model was finetuned to classify text into one of 7 categories of interest to SciX (Astronomy, Heliophysics, Planetary Science, Earth Science, NASA-funded Biophysics, Other Physics, Other, Text Garbage).
Tutorials
- generate text embedding (for downstream tasks)
- use astroBERT for the Fill-Mask task
- make NER-DEAL predictions
- categorize texts for SciX
BibTeX
@ARTICLE{2021arXiv211200590G,
author = {{Grezes}, Felix and {Blanco-Cuaresma}, Sergi and {Accomazzi}, Alberto and {Kurtz}, Michael J. and {Shapurian}, Golnaz and {Henneken}, Edwin and {Grant}, Carolyn S. and {Thompson}, Donna M. and {Chyla}, Roman and {McDonald}, Stephen and {Hostetler}, Timothy W. and {Templeton}, Matthew R. and {Lockhart}, Kelly E. and {Martinovic}, Nemanja and {Chen}, Shinyi and {Tanner}, Chris and {Protopapas}, Pavlos},
title = "{Building astroBERT, a language model for Astronomy \& Astrophysics}",
journal = {arXiv e-prints},
keywords = {Computer Science - Computation and Language, Astrophysics - Instrumentation and Methods for Astrophysics},
year = 2021,
month = dec,
eid = {arXiv:2112.00590},
pages = {arXiv:2112.00590},
archivePrefix = {arXiv},
eprint = {2112.00590},
primaryClass = {cs.CL},
adsurl = {https://ui.adsabs.harvard.edu/abs/2021arXiv211200590G},
adsnote = {Provided by the SAO/NASA Astrophysics Data System}
}
- Downloads last month
- 1,271,836