license: apache-2.0 | |
language: | |
- en | |
base_model: | |
- nasa-impact/nasa-smd-ibm-v0.1 | |
pipeline_tag: token-classification | |
tags: | |
- astronomy | |
- uat | |
# KAILAS | |
KAILAS (aka Keyword Labeler At SciX aka Indus-UAT-Labeler aka nasa-smd-ibm-v0.1_UAT_Labeler) is a RoBERTa-based, Encoder-only transformer model, domain-adapted for NASA Science Mission Directorate (SMD) applications. It's fine-tuned on scientific journals and articles relevant to NASA SMD, aiming to enhance natural language technologies like information retrieval and intelligent search. | |
This specific fork was finetuned on SciX Digital Library (https://scixplorer.org/, formerly NASA-ADS) proprietary data to label text with UAT labels (https://astrothesaurus.org/) | |
## Model Details | |
- **Base Model**: RoBERTa | |
- **Tokenizer**: Custom | |
- **Parameters**: 125M | |
## Training Data | |
- 18K titles, abstracts, body and acknowledgments from recent, quality astronomy papers | |
- approximately 217M tokens | |
<!-- ## Note --> | |
<!-- ## Citation --> | |
<!-- If you find this work useful, please cite using the following bibtex citation: --> | |
<!-- ## Disclaimer --> |