---
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:40906
- loss:MatryoshkaLoss
- loss:MegaBatchMarginLoss
widget:
- source_sentence: >-
One of three laminate structures that form the spindle pole body; the inner
plaque is in the nucleus.
sentences:
- >-
maturation of SSU-rRNA from tetracistronic rRNA transcript (SSU-rRNA, 5.8S
rRNA, 2S rRNA, LSU-rRNA)
- leukotriene receptor activity
- inner plaque of spindle pole body
- source_sentence: >-
The covalent attachment of a myristoyl group to the N-terminal amino acid
residue of a protein.
sentences:
- MHC class I protein complex assembly
- N-terminal protein myristoylation
- neurotrophin receptor activity
- source_sentence: >-
The inner, i.e. lumen-facing, lipid bilayer of the plastid envelope; also
faces the plastid stroma.
sentences:
- plastid inner membrane
- neuron migration involved in retrograde extension
- stomatal complex morphogenesis
- source_sentence: >-
Initiation of a region of tissue in a plant that is composed of one or more
undifferentiated cells capable of undergoing mitosis and differentiation,
thereby effecting growth and development of a plant by giving rise to more
meristem or specialized tissue.
sentences:
- meristem initiation
- polytene chromosome
- cardiac ventricle development
- source_sentence: >-
The sex chromosome present in both sexes of species in which the male is the
heterogametic sex. Two copies of the X chromosome are present in each
somatic cell of females and one copy is present in males.
sentences:
- establishment of cell polarity involved in gastrulation cell migration
- X chromosome
- somatic diversification of immune receptors by N region addition
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
- src2trg_accuracy
- trg2src_accuracy
- mean_accuracy
model-index:
- name: SentenceTransformer
results:
- task:
type: translation
name: Translation
dataset:
name: Unknown
type: unknown
metrics:
- type: src2trg_accuracy
value: 0.00015186028853454822
name: Src2Trg Accuracy
- type: trg2src_accuracy
value: 0
name: Trg2Src Accuracy
- type: mean_accuracy
value: 0.00007593014426727411
name: Mean Accuracy
license: apache-2.0
datasets:
- NothingMuch/GO-Terms
language:
- en
base_model:
- Snowflake/snowflake-arctic-embed-m-v1.5
---
# SentenceTransformer
This is a [sentence-transformers](https://www.SBERT.net) model trained on the parquet dataset. It maps sentences & paragraphs to a 128-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
## Model Details
### Model Description
- **Model Type:** Sentence Transformer
- **Maximum Sequence Length:** 512 tokens
- **Output Dimensionality:** 128 dimensions
- **Similarity Function:** Cosine Similarity
- **Training Dataset:**
- parquet
### Model Sources
- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
### Full Model Architecture
```
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
```
## Usage
### Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
```bash
pip install -U sentence-transformers
```
Then you can load this model and run inference.
```python
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("GO-Term-Embeddings")
# Run inference
sentences = [
'The sex chromosome present in both sexes of species in which the male is the heterogametic sex. Two copies of the X chromosome are present in each somatic cell of females and one copy is present in males.',
'X chromosome',
'somatic diversification of immune receptors by N region addition',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 128]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
```
## Evaluation
### Metrics
#### Translation
* Evaluated with [TranslationEvaluator
](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.TranslationEvaluator)
| Metric | Value |
|:------------------|:-----------|
| src2trg_accuracy | 0.0002 |
| trg2src_accuracy | 0.0 |
| **mean_accuracy** | **0.0001** |
## Training Details
### Training Dataset
#### parquet
* Dataset: parquet
* Size: 40,906 training samples
* Columns: anchor
and positive
* Approximate statistics based on the first 1000 samples:
| | anchor | positive |
|:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
| type | string | string |
| details |
Catalysis of the transfer of a mannose residue to an oligosaccharide, forming an alpha-(1->6) linkage.
| 1,6-alpha-mannosyltransferase activity
|
| Catalysis of the hydrolysis of ester linkages within a single-stranded deoxyribonucleic acid molecule by creating internal breaks.
| single-stranded DNA specific endodeoxyribonuclease activity
|
| Catalysis of the hydrolysis of ester linkages within a single-stranded deoxyribonucleic acid molecule by creating internal breaks.
| ssDNA-specific endodeoxyribonuclease activity
|
* Loss: [MatryoshkaLoss
](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
```json
{
"loss": "MegaBatchMarginLoss",
"matryoshka_dims": [
64,
32
],
"matryoshka_weights": [
1,
1
],
"n_dims_per_step": -1
}
```
### Evaluation Dataset
#### parquet
* Dataset: parquet
* Size: 6,585 evaluation samples
* Columns: anchor
and positive
* Approximate statistics based on the first 1000 samples:
| | anchor | positive |
|:--------|:-----------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|
| type | string | string |
| details | The maintenance of the structure and integrity of the mitochondrial genome; includes replication and segregation of the mitochondrial chromosome.
| mitochondrial genome maintenance
|
| The repair of single strand breaks in DNA. Repair of such breaks is mediated by the same enzyme systems as are used in base excision repair.
| single strand break repair
|
| Any process that modulates the frequency, rate or extent of DNA recombination, a DNA metabolic process in which a new genotype is formed by reassortment of genes resulting in gene combinations different from those that were present in the parents.
| regulation of DNA recombination
|
* Loss: [MatryoshkaLoss
](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
```json
{
"loss": "MegaBatchMarginLoss",
"matryoshka_dims": [
64,
32
],
"matryoshka_weights": [
1,
1
],
"n_dims_per_step": -1
}
```
### Training Hyperparameters
#### Non-Default Hyperparameters
- `per_device_train_batch_size`: 10
- `per_device_eval_batch_size`: 5
- `torch_empty_cache_steps`: 200
- `learning_rate`: 0.2
- `weight_decay`: 0.001
- `num_train_epochs`: 1
- `warmup_ratio`: 0.25
- `seed`: 25
- `batch_sampler`: no_duplicates
#### All Hyperparameters