File size: 5,076 Bytes
edb0ca5 08ac2b4 edb0ca5 38d4791 44c8a23 d3e222d 08ac2b4 7ad8a54 08ac2b4 7ad8a54 08ac2b4 f3903f6 07591dd 08ac2b4 fc0fa71 08ac2b4 efb6db6 08ac2b4 4f34e68 efb6db6 08ac2b4 4f34e68 de8b481 4f34e68 8840be8 08ac2b4 c1306eb 08ac2b4 b1931b9 2757b30 08ac2b4 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 |
---
license: apache-2.0
language:
- en
library_name: sentence-transformers
tags:
- earth science
- climate
- biology
pipeline_tag: sentence-similarity
---
**This model is deprecated. please use the updated sentence transformer model here: https://huggingface.co/nasa-impact/nasa-smd-ibm-st-v2.
Alternatively, you can also use distilled version of the model here: https://huggingface.co/nasa-impact/nasa-ibm-st.38m**
**FOR ARCHIVAL PURPOSES ONLY**
# Model Card for nasa-smd-ibm-v0.1
`nasa-smd-ibm-st, Also Known as `Indus-st`, is a Bi-encoder sentence transformer model, that is fine-tuned from nasa-smd-ibm-v0.1 encoder model. It's trained with 271 million examples along with a domain-specific dataset of 2.6 million examples from documents curated by NASA Science Mission Directorate (SMD). With this model, we aim to enhance natural language technologies like information retrieval and intelligent search as it applies to SMD NLP applications.
## Model Details
- **Base Model**: nasa-smd-ibm-v0.1 (Indus)
- **Tokenizer**: Custom
- **Parameters**: 125M
- **Training Strategy**: Sentence Pairs, and score indicating relevancy. The model encodes the two sentence pairs independently and cosine similarity is calculated. the similarity is optimized using the relevance score.
## Training Data
![image/png](https://cdn-uploads.huggingface.co/production/uploads/61099e5d86580d4580767226/ZjcHW24iKsvUYBhoL7eMM.png)
Figure: Open dataset sources for sentence transformers (269M in total)
Additionally, 2.6M abstract + title pairs collected from NASA SMD documents.
## Training Procedure
- **Framework**: PyTorch 1.9.1
- **sentence-transformers version**: 4.30.2
- **Strategy**: Sentence Pairs
## Evaluation
Following models are evaluated:
1. All-MiniLM-l6-v2 [sentence-transformers/all-MiniLM-L6-v2]
2. BGE-base [BAAI/bge-base-en-v1.5]
3. RoBERTa-base [roberta-base]
4. nasa-smd-ibm-rtvr_v0.1 [nasa-impact/nasa-smd-ibm-st]
![image/png](https://cdn-uploads.huggingface.co/production/uploads/61099e5d86580d4580767226/At5uL6OU2k6J0hIBlVLBu.png)
Figure: BEIR Evaluation Metrics
![image/png](https://cdn-uploads.huggingface.co/production/uploads/61099e5d86580d4580767226/pwh_CktqJYrQuiKGHxfWj.png)
Figure: Retrieval Benchmark Evaluation
## Uses
- Information Retreival
- Sentence Similarity Search
For NASA SMD related, scientific usecases.
### Usage
```python
from sentence_transformers import SentenceTransformer, util
model = SentenceTransformer('path_to_slate_model')
input_queries = [
'query: how much protein should a female eat', 'query: summit define']
input_passages = [
"As a general guideline, the CDC's average requirement of protein for women ages 19 to 70 is 46 grams per day.
But, as you can see from this chart, you'll need to increase that if you're expecting or training for a marathon. Check out the chart below to see how much protein you should be eating each day.",
"Definition of summit for English Language Learners. : 1 the highest point of a mountain : the top of a mountain. : 2 the highest level. : 3 a meeting or series of meetings between the leaders of two or more governments."]
query_embeddings = model.encode(input_queries)
passage_embeddings = model.encode(input_passages)
print(util.cos_sim(query_embeddings, passage_embeddings))
```
# Note
This Model is released in support of the training and evaluation of the encoder language model ["Indus"](https://huggingface.co/nasa-impact/nasa-smd-ibm-v0.1).
Accompanying paper can be found here: https://arxiv.org/abs/2405.10725
## Citation
If you find this work useful, please cite using the following bibtex citation:
```bibtex
@misc {nasa-impact_2023,
author = { Aashka Trivedi and Bishwaranjan Bhattacharjee and Muthukumaran Ramasubramanian and Iksha Gurung and Masayasu Maraoka and Rahul Ramachandran and Manil Maskey and Kaylin Bugbee and Mike Little and Elizabeth Fancher and Lauren Sanders and Sylvain Costes and Sergi Blanco-Cuaresma and Kelly Lockhart and Thomas Allen and Felix Grazes and Megan Ansdell and Alberto Accomazzi and Sanaz Vahidinia and Ryan McGranaghan and Armin Mehrabian and Tsendgar Lee},
title = { nasa-smd-ibm-st (Revision 08ac2b4) },
year = 2023,
url = { https://huggingface.co/nasa-impact/nasa-smd-ibm-st },
doi = { 10.57967/hf/1441 },
publisher = { Hugging Face }
}
```
## Attribution
IBM Research
- Aashka Trivedi
- Masayasu Muraoka
- Bishwaranjan Bhattacharjee
NASA SMD
- Muthukumaran Ramasubramanian
- Iksha Gurung
- Rahul Ramachandran
- Manil Maskey
- Kaylin Bugbee
- Mike Little
- Elizabeth Fancher
- Lauren Sanders
- Sylvain Costes
- Sergi Blanco-Cuaresma
- Kelly Lockhart
- Thomas Allen
- Felix Grazes
- Megan Ansdell
- Alberto Accomazzi
- Sanaz Vahidinia
- Ryan McGranaghan
- Armin Mehrabian
- Tsendgar Lee
## Disclaimer
This sentence-transformer model is currently in an experimental phase. We are working to improve the model's capabilities and performance, and as we progress, we invite the community to engage with this model, provide feedback, and contribute to its evolution.
|