---
library_name: hierarchy-transformers
pipeline_tag: feature-extraction
tags:
- hierarchy-transformers
- feature-extraction
- hierarchy-encoding
- subsumption-relationships
- transformers
license: apache-2.0
language:
- en
metrics:
- precision
- recall
- f1
base_model:
- sentence-transformers/all-MiniLM-L12-v2
---

# Hierarchy-Transformers/HiT-MiniLM-L12-WordNetNoun

A **Hi**erarchy **T**ransformer Encoder (HiT) model that explicitly encodes entities according to their hierarchical relationships.

### Model Description

<!-- Provide a longer summary of what this model is. -->

HiT-MiniLM-L12-WordNet is a HiT model trained on WordNet's subsumption (hypernym) hierarchy of noun entities.

- **Developed by:** [Yuan He](https://www.yuanhe.wiki/), Zhangdie Yuan, Jiaoyan Chen, and Ian Horrocks
- **Model type:** Hierarchy Transformer Encoder (HiT) 
- **License:** Apache license 2.0
- **Hierarchy**: WordNet's subsumption (hypernym) hierarchy of noun entities.
- **Training Dataset**: Download `wordnet-mixed.zip` from [Datasets for HiTs on Zenodo](https://zenodo.org/doi/10.5281/zenodo.10511042)
- **Pre-trained model:** [sentence-transformers/all-MiniLM-L12-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L12-v2)
- **Training Objectives**: Jointly optimised on *Hyperbolic Clustering* and *Hyperbolic Centripetal* losses (see definitions in the [paper](https://arxiv.org/abs/2401.11374))

### Model Versions

| **Version** | **Model Revision** | **Note** |
|------------|---------|----------|
|v1.0 (Random Negatives)| `main` or `v1-random-negatives`| The variant trained on random negatives, as detailed in the [paper](https://arxiv.org/abs/2401.11374).|
|v1.0 (Hard Negatives)| `v1-hard-negatives` | The variant trained on hard negatives, as detailed in the [paper](https://arxiv.org/abs/2401.11374). |


### Model Sources

<!-- Provide the basic links for the model. -->

- **Repository:** https://github.com/KRR-Oxford/HierarchyTransformers
- **Paper:** [Language Models as Hierarchy Encoders](https://arxiv.org/abs/2401.11374)

## Usage

<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->

HiT models are used to encode entities (presented as texts) and predict their hierarhical relationships in hyperbolic space. 

### Get Started

Install `hierarchy_transformers` (check our [repository](https://github.com/KRR-Oxford/HierarchyTransformers)) through `pip` or `GitHub`.

Use the code below to get started with the model.

```python
from hierarchy_transformers import HierarchyTransformer

# load the model
model = HierarchyTransformer.from_pretrained('Hierarchy-Transformers/HiT-MiniLM-L12-WordNetNoun')

# entity names to be encoded.
entity_names = ["computer", "personal computer", "fruit", "berry"]

# get the entity embeddings
entity_embeddings = model.encode(entity_names)
```

### Default Probing for Subsumption Prediction

Use the entity embeddings to predict the subsumption relationships between them.

```python
# suppose we want to compare "personal computer" and "computer", "berry" and "fruit"
child_entity_embeddings = model.encode(["personal computer", "berry"], convert_to_tensor=True)
parent_entity_embeddings = model.encode(["computer", "fruit"], convert_to_tensor=True)

# compute the hyperbolic distances and norms of entity embeddings
dists = model.manifold.dist(child_entity_embeddings, parent_entity_embeddings)
child_norms = model.manifold.dist0(child_entity_embeddings)
parent_norms = model.manifold.dist0(parent_entity_embeddings)

# use the empirical function for subsumption prediction proposed in the paper
# `centri_score_weight` and the overall threshold are determined on the validation set
subsumption_scores = - (dists + centri_score_weight * (parent_norms - child_norms))
```

Training and evaluation scripts are available at [GitHub](https://github.com/KRR-Oxford/HierarchyTransformers/tree/main/scripts). See `scripts/evaluate.py` for how we determine the hyperparameters on the validation set for subsumption prediction.

Technical details are presented in the [paper](https://arxiv.org/abs/2401.11374).


## Full Model Architecture
```
HierarchyTransformer(
  (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False})
)
```

## Citation

<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->

Preprint on arxiv: https://arxiv.org/abs/2401.11374.

*Yuan He, Zhangdie Yuan, Jiaoyan Chen, Ian Horrocks.* **Language Models as Hierarchy Encoders.** To Appear at NeurIPS 2024.

```
@article{he2024language,
  title={Language Models as Hierarchy Encoders},
  author={He, Yuan and Yuan, Zhangdie and Chen, Jiaoyan and Horrocks, Ian},
  journal={arXiv preprint arXiv:2401.11374},
  year={2024}
}
```


## Model Card Contact

For any queries or feedback, please contact Yuan He (`yuan.he(at)cs.ox.ac.uk`).