|
--- |
|
base_model: Alibaba-NLP/gte-large-en-v1.5 |
|
datasets: [] |
|
language: |
|
- en |
|
library_name: sentence-transformers |
|
license: apache-2.0 |
|
metrics: |
|
- cosine_accuracy@1 |
|
- cosine_accuracy@3 |
|
- cosine_accuracy@5 |
|
- cosine_accuracy@10 |
|
- cosine_precision@1 |
|
- cosine_precision@3 |
|
- cosine_precision@5 |
|
- cosine_precision@10 |
|
- cosine_recall@1 |
|
- cosine_recall@3 |
|
- cosine_recall@5 |
|
- cosine_recall@10 |
|
- cosine_ndcg@10 |
|
- cosine_mrr@10 |
|
- cosine_map@100 |
|
pipeline_tag: sentence-similarity |
|
tags: |
|
- sentence-transformers |
|
- sentence-similarity |
|
- feature-extraction |
|
- generated_from_trainer |
|
- dataset_size:4275 |
|
- loss:MatryoshkaLoss |
|
- loss:MultipleNegativesRankingLoss |
|
widget: |
|
- source_sentence: The fundamental elements of Goldman Sachs’ robust risk culture |
|
include governance, risk identification, measurement, mitigation, culture and |
|
conduct, and infrastructure. They believe these elements work together to complement |
|
and reinforce each other to produce a comprehensive view of risk management. |
|
sentences: |
|
- What are the financial highlights for Bank of America Corp. in its latest fiscal |
|
year report? |
|
- What is Berkshire Hathaway's involvement in the energy sector? |
|
- What is Goldman Sach’s approach towards maintaining a robust risk culture? |
|
- source_sentence: HealthTech Inc.'s new drug for diabetes treatment, launched in |
|
2021, contributed to approximately 30% of its total revenues for that year. |
|
sentences: |
|
- What is IBM's debt to equity ratio as of 2022? |
|
- In what way does HealthTech Inc's new drug contribute to its revenue generation? |
|
- What is the revenue breakdown of Alphabet for the year 2021? |
|
- source_sentence: The driving factor behind Tesla’s 2023 growth was the surge in |
|
demand for electric vehicles. |
|
sentences: |
|
- Why did McDonald's observe a decrease in overall revenue in 2023 relative to 2022? |
|
- What key strategy did Walmart employ to boost its sales in 2016? |
|
- What was the driving factor behind Tesla's growth in 2023? |
|
- source_sentence: Pfizer is committed to ensuring that people around the world have |
|
access to its medical products. In line with this commitment, Pfizer has implemented |
|
programs such as donation drives, price reduction initiatives, and patient assistance |
|
programs to aid those in need. Furthermore, through partnerships with NGOs and |
|
governments, Pfizer strives to strengthen healthcare systems in underprivileged |
|
regions. |
|
sentences: |
|
- What is the strategy of Pfizer to improve access to medicines in underprivileged |
|
areas? |
|
- What percentage of growth in revenue did Adobe Systems report in June 2020? |
|
- How is Citigroup differentiating itself among other banks? |
|
- source_sentence: JP Morgan reported total deposits of $2.6 trillion in the year |
|
ending December 31, 2023. |
|
sentences: |
|
- In the fiscal year 2023, what impact did the acquisition of T-Mobile bring to |
|
the revenue of AT&T? |
|
- What is the primary source of revenue for the software company, Microsoft? |
|
- What were JP Morgan's total deposits in 2023? |
|
model-index: |
|
- name: gte-large-en-v1.5-financial-rag-matryoshka |
|
results: |
|
- task: |
|
type: information-retrieval |
|
name: Information Retrieval |
|
dataset: |
|
name: dim 1024 |
|
type: dim_1024 |
|
metrics: |
|
- type: cosine_accuracy@1 |
|
value: 0.88 |
|
name: Cosine Accuracy@1 |
|
- type: cosine_accuracy@3 |
|
value: 0.96 |
|
name: Cosine Accuracy@3 |
|
- type: cosine_accuracy@5 |
|
value: 0.9866666666666667 |
|
name: Cosine Accuracy@5 |
|
- type: cosine_accuracy@10 |
|
value: 0.9955555555555555 |
|
name: Cosine Accuracy@10 |
|
- type: cosine_precision@1 |
|
value: 0.88 |
|
name: Cosine Precision@1 |
|
- type: cosine_precision@3 |
|
value: 0.32 |
|
name: Cosine Precision@3 |
|
- type: cosine_precision@5 |
|
value: 0.19733333333333336 |
|
name: Cosine Precision@5 |
|
- type: cosine_precision@10 |
|
value: 0.09955555555555556 |
|
name: Cosine Precision@10 |
|
- type: cosine_recall@1 |
|
value: 0.88 |
|
name: Cosine Recall@1 |
|
- type: cosine_recall@3 |
|
value: 0.96 |
|
name: Cosine Recall@3 |
|
- type: cosine_recall@5 |
|
value: 0.9866666666666667 |
|
name: Cosine Recall@5 |
|
- type: cosine_recall@10 |
|
value: 0.9955555555555555 |
|
name: Cosine Recall@10 |
|
- type: cosine_ndcg@10 |
|
value: 0.9426916896167131 |
|
name: Cosine Ndcg@10 |
|
- type: cosine_mrr@10 |
|
value: 0.9251851851851851 |
|
name: Cosine Mrr@10 |
|
- type: cosine_map@100 |
|
value: 0.925362962962963 |
|
name: Cosine Map@100 |
|
- task: |
|
type: information-retrieval |
|
name: Information Retrieval |
|
dataset: |
|
name: dim 768 |
|
type: dim_768 |
|
metrics: |
|
- type: cosine_accuracy@1 |
|
value: 0.88 |
|
name: Cosine Accuracy@1 |
|
- type: cosine_accuracy@3 |
|
value: 0.96 |
|
name: Cosine Accuracy@3 |
|
- type: cosine_accuracy@5 |
|
value: 0.9866666666666667 |
|
name: Cosine Accuracy@5 |
|
- type: cosine_accuracy@10 |
|
value: 0.9911111111111112 |
|
name: Cosine Accuracy@10 |
|
- type: cosine_precision@1 |
|
value: 0.88 |
|
name: Cosine Precision@1 |
|
- type: cosine_precision@3 |
|
value: 0.32 |
|
name: Cosine Precision@3 |
|
- type: cosine_precision@5 |
|
value: 0.19733333333333336 |
|
name: Cosine Precision@5 |
|
- type: cosine_precision@10 |
|
value: 0.09911111111111114 |
|
name: Cosine Precision@10 |
|
- type: cosine_recall@1 |
|
value: 0.88 |
|
name: Cosine Recall@1 |
|
- type: cosine_recall@3 |
|
value: 0.96 |
|
name: Cosine Recall@3 |
|
- type: cosine_recall@5 |
|
value: 0.9866666666666667 |
|
name: Cosine Recall@5 |
|
- type: cosine_recall@10 |
|
value: 0.9911111111111112 |
|
name: Cosine Recall@10 |
|
- type: cosine_ndcg@10 |
|
value: 0.940825047039427 |
|
name: Cosine Ndcg@10 |
|
- type: cosine_mrr@10 |
|
value: 0.924 |
|
name: Cosine Mrr@10 |
|
- type: cosine_map@100 |
|
value: 0.9245274971941638 |
|
name: Cosine Map@100 |
|
- task: |
|
type: information-retrieval |
|
name: Information Retrieval |
|
dataset: |
|
name: dim 512 |
|
type: dim_512 |
|
metrics: |
|
- type: cosine_accuracy@1 |
|
value: 0.8711111111111111 |
|
name: Cosine Accuracy@1 |
|
- type: cosine_accuracy@3 |
|
value: 0.96 |
|
name: Cosine Accuracy@3 |
|
- type: cosine_accuracy@5 |
|
value: 0.9866666666666667 |
|
name: Cosine Accuracy@5 |
|
- type: cosine_accuracy@10 |
|
value: 0.9911111111111112 |
|
name: Cosine Accuracy@10 |
|
- type: cosine_precision@1 |
|
value: 0.8711111111111111 |
|
name: Cosine Precision@1 |
|
- type: cosine_precision@3 |
|
value: 0.32 |
|
name: Cosine Precision@3 |
|
- type: cosine_precision@5 |
|
value: 0.19733333333333336 |
|
name: Cosine Precision@5 |
|
- type: cosine_precision@10 |
|
value: 0.09911111111111114 |
|
name: Cosine Precision@10 |
|
- type: cosine_recall@1 |
|
value: 0.8711111111111111 |
|
name: Cosine Recall@1 |
|
- type: cosine_recall@3 |
|
value: 0.96 |
|
name: Cosine Recall@3 |
|
- type: cosine_recall@5 |
|
value: 0.9866666666666667 |
|
name: Cosine Recall@5 |
|
- type: cosine_recall@10 |
|
value: 0.9911111111111112 |
|
name: Cosine Recall@10 |
|
- type: cosine_ndcg@10 |
|
value: 0.938126332642602 |
|
name: Cosine Ndcg@10 |
|
- type: cosine_mrr@10 |
|
value: 0.9202962962962962 |
|
name: Cosine Mrr@10 |
|
- type: cosine_map@100 |
|
value: 0.9207248677248678 |
|
name: Cosine Map@100 |
|
- task: |
|
type: information-retrieval |
|
name: Information Retrieval |
|
dataset: |
|
name: dim 256 |
|
type: dim_256 |
|
metrics: |
|
- type: cosine_accuracy@1 |
|
value: 0.8755555555555555 |
|
name: Cosine Accuracy@1 |
|
- type: cosine_accuracy@3 |
|
value: 0.96 |
|
name: Cosine Accuracy@3 |
|
- type: cosine_accuracy@5 |
|
value: 0.9866666666666667 |
|
name: Cosine Accuracy@5 |
|
- type: cosine_accuracy@10 |
|
value: 0.9911111111111112 |
|
name: Cosine Accuracy@10 |
|
- type: cosine_precision@1 |
|
value: 0.8755555555555555 |
|
name: Cosine Precision@1 |
|
- type: cosine_precision@3 |
|
value: 0.32 |
|
name: Cosine Precision@3 |
|
- type: cosine_precision@5 |
|
value: 0.19733333333333336 |
|
name: Cosine Precision@5 |
|
- type: cosine_precision@10 |
|
value: 0.09911111111111114 |
|
name: Cosine Precision@10 |
|
- type: cosine_recall@1 |
|
value: 0.8755555555555555 |
|
name: Cosine Recall@1 |
|
- type: cosine_recall@3 |
|
value: 0.96 |
|
name: Cosine Recall@3 |
|
- type: cosine_recall@5 |
|
value: 0.9866666666666667 |
|
name: Cosine Recall@5 |
|
- type: cosine_recall@10 |
|
value: 0.9911111111111112 |
|
name: Cosine Recall@10 |
|
- type: cosine_ndcg@10 |
|
value: 0.9395718726230007 |
|
name: Cosine Ndcg@10 |
|
- type: cosine_mrr@10 |
|
value: 0.9222962962962963 |
|
name: Cosine Mrr@10 |
|
- type: cosine_map@100 |
|
value: 0.9227724867724867 |
|
name: Cosine Map@100 |
|
- task: |
|
type: information-retrieval |
|
name: Information Retrieval |
|
dataset: |
|
name: dim 128 |
|
type: dim_128 |
|
metrics: |
|
- type: cosine_accuracy@1 |
|
value: 0.8666666666666667 |
|
name: Cosine Accuracy@1 |
|
- type: cosine_accuracy@3 |
|
value: 0.9555555555555556 |
|
name: Cosine Accuracy@3 |
|
- type: cosine_accuracy@5 |
|
value: 0.9866666666666667 |
|
name: Cosine Accuracy@5 |
|
- type: cosine_accuracy@10 |
|
value: 0.9911111111111112 |
|
name: Cosine Accuracy@10 |
|
- type: cosine_precision@1 |
|
value: 0.8666666666666667 |
|
name: Cosine Precision@1 |
|
- type: cosine_precision@3 |
|
value: 0.3185185185185185 |
|
name: Cosine Precision@3 |
|
- type: cosine_precision@5 |
|
value: 0.19733333333333336 |
|
name: Cosine Precision@5 |
|
- type: cosine_precision@10 |
|
value: 0.09911111111111114 |
|
name: Cosine Precision@10 |
|
- type: cosine_recall@1 |
|
value: 0.8666666666666667 |
|
name: Cosine Recall@1 |
|
- type: cosine_recall@3 |
|
value: 0.9555555555555556 |
|
name: Cosine Recall@3 |
|
- type: cosine_recall@5 |
|
value: 0.9866666666666667 |
|
name: Cosine Recall@5 |
|
- type: cosine_recall@10 |
|
value: 0.9911111111111112 |
|
name: Cosine Recall@10 |
|
- type: cosine_ndcg@10 |
|
value: 0.9346269584282435 |
|
name: Cosine Ndcg@10 |
|
- type: cosine_mrr@10 |
|
value: 0.9157037037037037 |
|
name: Cosine Mrr@10 |
|
- type: cosine_map@100 |
|
value: 0.9160403095943067 |
|
name: Cosine Map@100 |
|
- task: |
|
type: information-retrieval |
|
name: Information Retrieval |
|
dataset: |
|
name: dim 64 |
|
type: dim_64 |
|
metrics: |
|
- type: cosine_accuracy@1 |
|
value: 0.8311111111111111 |
|
name: Cosine Accuracy@1 |
|
- type: cosine_accuracy@3 |
|
value: 0.96 |
|
name: Cosine Accuracy@3 |
|
- type: cosine_accuracy@5 |
|
value: 0.9733333333333334 |
|
name: Cosine Accuracy@5 |
|
- type: cosine_accuracy@10 |
|
value: 0.9911111111111112 |
|
name: Cosine Accuracy@10 |
|
- type: cosine_precision@1 |
|
value: 0.8311111111111111 |
|
name: Cosine Precision@1 |
|
- type: cosine_precision@3 |
|
value: 0.32 |
|
name: Cosine Precision@3 |
|
- type: cosine_precision@5 |
|
value: 0.19466666666666665 |
|
name: Cosine Precision@5 |
|
- type: cosine_precision@10 |
|
value: 0.09911111111111114 |
|
name: Cosine Precision@10 |
|
- type: cosine_recall@1 |
|
value: 0.8311111111111111 |
|
name: Cosine Recall@1 |
|
- type: cosine_recall@3 |
|
value: 0.96 |
|
name: Cosine Recall@3 |
|
- type: cosine_recall@5 |
|
value: 0.9733333333333334 |
|
name: Cosine Recall@5 |
|
- type: cosine_recall@10 |
|
value: 0.9911111111111112 |
|
name: Cosine Recall@10 |
|
- type: cosine_ndcg@10 |
|
value: 0.9208110890988729 |
|
name: Cosine Ndcg@10 |
|
- type: cosine_mrr@10 |
|
value: 0.8971957671957672 |
|
name: Cosine Mrr@10 |
|
- type: cosine_map@100 |
|
value: 0.8975242479721762 |
|
name: Cosine Map@100 |
|
--- |
|
|
|
# financial-rag-matryoshka |
|
|
|
Model finetuned for financial use-cases from [Alibaba-NLP/gte-large-en-v1.5](https://huggingface.co/Alibaba-NLP/gte-large-en-v1.5). It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. |
|
|
|
This model strives to excel tremendously in Financial Document Retrieval Tasks, concurrently preserving a maximum level of generalized performance. |
|
## Model Details |
|
|
|
### Model Description |
|
- **Model Type:** Sentence Transformer |
|
- **Base model:** [Alibaba-NLP/gte-large-en-v1.5](https://huggingface.co/Alibaba-NLP/gte-large-en-v1.5) <!-- at revision a0d6174973604c8ef416d9f6ed0f4c17ab32d78d --> |
|
- **Maximum Sequence Length:** 8192 tokens |
|
- **Output Dimensionality:** 1024 tokens |
|
- **Similarity Function:** Cosine Similarity |
|
<!-- - **Training Dataset:** Unknown --> |
|
- **Language:** en |
|
- **License:** apache-2.0 |
|
|
|
### Model Sources |
|
|
|
- **Documentation:** [Sentence Transformers Documentation](https://sbert.net) |
|
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers) |
|
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers) |
|
|
|
### Full Model Architecture |
|
|
|
``` |
|
SentenceTransformer( |
|
(0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: NewModel |
|
(1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) |
|
) |
|
``` |
|
|
|
## Usage |
|
|
|
### Direct Usage (Sentence Transformers) |
|
|
|
First install the Sentence Transformers library: |
|
|
|
```bash |
|
pip install -U sentence-transformers |
|
``` |
|
|
|
Then you can load this model and run inference. |
|
```python |
|
from sentence_transformers import SentenceTransformer |
|
|
|
# Download from the 🤗 Hub |
|
model = SentenceTransformer("rbhatia46/gte-large-en-v1.5-financial-rag-matryoshka") |
|
# Run inference |
|
sentences = [ |
|
'JP Morgan reported total deposits of $2.6 trillion in the year ending December 31, 2023.', |
|
"What were JP Morgan's total deposits in 2023?", |
|
'What is the primary source of revenue for the software company, Microsoft?', |
|
] |
|
embeddings = model.encode(sentences) |
|
print(embeddings.shape) |
|
# [3, 1024] |
|
|
|
# Get the similarity scores for the embeddings |
|
similarities = model.similarity(embeddings, embeddings) |
|
print(similarities.shape) |
|
# [3, 3] |
|
``` |
|
|
|
<!-- |
|
### Direct Usage (Transformers) |
|
|
|
<details><summary>Click to see the direct usage in Transformers</summary> |
|
|
|
</details> |
|
--> |
|
|
|
<!-- |
|
### Downstream Usage (Sentence Transformers) |
|
|
|
You can finetune this model on your own dataset. |
|
|
|
<details><summary>Click to expand</summary> |
|
|
|
</details> |
|
--> |
|
|
|
<!-- |
|
### Out-of-Scope Use |
|
|
|
*List how the model may foreseeably be misused and address what users ought not to do with the model.* |
|
--> |
|
|
|
## Evaluation |
|
|
|
### Metrics |
|
|
|
#### Information Retrieval |
|
* Dataset: `dim_1024` |
|
* Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) |
|
|
|
| Metric | Value | |
|
|:--------------------|:-----------| |
|
| cosine_accuracy@1 | 0.88 | |
|
| cosine_accuracy@3 | 0.96 | |
|
| cosine_accuracy@5 | 0.9867 | |
|
| cosine_accuracy@10 | 0.9956 | |
|
| cosine_precision@1 | 0.88 | |
|
| cosine_precision@3 | 0.32 | |
|
| cosine_precision@5 | 0.1973 | |
|
| cosine_precision@10 | 0.0996 | |
|
| cosine_recall@1 | 0.88 | |
|
| cosine_recall@3 | 0.96 | |
|
| cosine_recall@5 | 0.9867 | |
|
| cosine_recall@10 | 0.9956 | |
|
| cosine_ndcg@10 | 0.9427 | |
|
| cosine_mrr@10 | 0.9252 | |
|
| **cosine_map@100** | **0.9254** | |
|
|
|
#### Information Retrieval |
|
* Dataset: `dim_768` |
|
* Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) |
|
|
|
| Metric | Value | |
|
|:--------------------|:-----------| |
|
| cosine_accuracy@1 | 0.88 | |
|
| cosine_accuracy@3 | 0.96 | |
|
| cosine_accuracy@5 | 0.9867 | |
|
| cosine_accuracy@10 | 0.9911 | |
|
| cosine_precision@1 | 0.88 | |
|
| cosine_precision@3 | 0.32 | |
|
| cosine_precision@5 | 0.1973 | |
|
| cosine_precision@10 | 0.0991 | |
|
| cosine_recall@1 | 0.88 | |
|
| cosine_recall@3 | 0.96 | |
|
| cosine_recall@5 | 0.9867 | |
|
| cosine_recall@10 | 0.9911 | |
|
| cosine_ndcg@10 | 0.9408 | |
|
| cosine_mrr@10 | 0.924 | |
|
| **cosine_map@100** | **0.9245** | |
|
|
|
#### Information Retrieval |
|
* Dataset: `dim_512` |
|
* Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) |
|
|
|
| Metric | Value | |
|
|:--------------------|:-----------| |
|
| cosine_accuracy@1 | 0.8711 | |
|
| cosine_accuracy@3 | 0.96 | |
|
| cosine_accuracy@5 | 0.9867 | |
|
| cosine_accuracy@10 | 0.9911 | |
|
| cosine_precision@1 | 0.8711 | |
|
| cosine_precision@3 | 0.32 | |
|
| cosine_precision@5 | 0.1973 | |
|
| cosine_precision@10 | 0.0991 | |
|
| cosine_recall@1 | 0.8711 | |
|
| cosine_recall@3 | 0.96 | |
|
| cosine_recall@5 | 0.9867 | |
|
| cosine_recall@10 | 0.9911 | |
|
| cosine_ndcg@10 | 0.9381 | |
|
| cosine_mrr@10 | 0.9203 | |
|
| **cosine_map@100** | **0.9207** | |
|
|
|
#### Information Retrieval |
|
* Dataset: `dim_256` |
|
* Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) |
|
|
|
| Metric | Value | |
|
|:--------------------|:-----------| |
|
| cosine_accuracy@1 | 0.8756 | |
|
| cosine_accuracy@3 | 0.96 | |
|
| cosine_accuracy@5 | 0.9867 | |
|
| cosine_accuracy@10 | 0.9911 | |
|
| cosine_precision@1 | 0.8756 | |
|
| cosine_precision@3 | 0.32 | |
|
| cosine_precision@5 | 0.1973 | |
|
| cosine_precision@10 | 0.0991 | |
|
| cosine_recall@1 | 0.8756 | |
|
| cosine_recall@3 | 0.96 | |
|
| cosine_recall@5 | 0.9867 | |
|
| cosine_recall@10 | 0.9911 | |
|
| cosine_ndcg@10 | 0.9396 | |
|
| cosine_mrr@10 | 0.9223 | |
|
| **cosine_map@100** | **0.9228** | |
|
|
|
#### Information Retrieval |
|
* Dataset: `dim_128` |
|
* Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) |
|
|
|
| Metric | Value | |
|
|:--------------------|:----------| |
|
| cosine_accuracy@1 | 0.8667 | |
|
| cosine_accuracy@3 | 0.9556 | |
|
| cosine_accuracy@5 | 0.9867 | |
|
| cosine_accuracy@10 | 0.9911 | |
|
| cosine_precision@1 | 0.8667 | |
|
| cosine_precision@3 | 0.3185 | |
|
| cosine_precision@5 | 0.1973 | |
|
| cosine_precision@10 | 0.0991 | |
|
| cosine_recall@1 | 0.8667 | |
|
| cosine_recall@3 | 0.9556 | |
|
| cosine_recall@5 | 0.9867 | |
|
| cosine_recall@10 | 0.9911 | |
|
| cosine_ndcg@10 | 0.9346 | |
|
| cosine_mrr@10 | 0.9157 | |
|
| **cosine_map@100** | **0.916** | |
|
|
|
#### Information Retrieval |
|
* Dataset: `dim_64` |
|
* Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) |
|
|
|
| Metric | Value | |
|
|:--------------------|:-----------| |
|
| cosine_accuracy@1 | 0.8311 | |
|
| cosine_accuracy@3 | 0.96 | |
|
| cosine_accuracy@5 | 0.9733 | |
|
| cosine_accuracy@10 | 0.9911 | |
|
| cosine_precision@1 | 0.8311 | |
|
| cosine_precision@3 | 0.32 | |
|
| cosine_precision@5 | 0.1947 | |
|
| cosine_precision@10 | 0.0991 | |
|
| cosine_recall@1 | 0.8311 | |
|
| cosine_recall@3 | 0.96 | |
|
| cosine_recall@5 | 0.9733 | |
|
| cosine_recall@10 | 0.9911 | |
|
| cosine_ndcg@10 | 0.9208 | |
|
| cosine_mrr@10 | 0.8972 | |
|
| **cosine_map@100** | **0.8975** | |
|
|
|
<!-- |
|
## Bias, Risks and Limitations |
|
|
|
*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.* |
|
--> |
|
|
|
<!-- |
|
### Recommendations |
|
|
|
*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.* |
|
--> |
|
|
|
## Training Details |
|
|
|
### Training Dataset |
|
|
|
#### Unnamed Dataset |
|
|
|
|
|
* Size: 4,275 training samples |
|
* Columns: <code>positive</code> and <code>anchor</code> |
|
* Approximate statistics based on the first 1000 samples: |
|
| | positive | anchor | |
|
|:--------|:------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------| |
|
| type | string | string | |
|
| details | <ul><li>min: 15 tokens</li><li>mean: 44.74 tokens</li><li>max: 114 tokens</li></ul> | <ul><li>min: 9 tokens</li><li>mean: 18.12 tokens</li><li>max: 32 tokens</li></ul> | |
|
* Samples: |
|
| positive | anchor | |
|
|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------| |
|
| <code>At the end of fiscal year 2023, Exxon Mobil reported a debt-to-equity ratio of 0.32, implying that the company used more equity than debt in its capital structure.</code> | <code>What was the debt-to-equity ratio for Exxon Mobil at the end of fiscal year 2023?</code> | |
|
| <code>Amazon Web Services (AWS) generated $12.7 billion in net sales in the fourth quarter of 2020, up 28% from the same period of the previous year. It accounted for about 10% of Amazon’s total net sales for the quarter.</code> | <code>How did Amazon's AWS segment perform in the fourth quarter of 2020?</code> | |
|
| <code>JPMorgan Chase generates revenues by providing a wide range of banking and financial services. These include investment banking (M&As, advisory), consumer and community banking (home mortgages, auto loans), commercial banking, and asset and wealth management.</code> | <code>What are the key revenue sources for JPMorgan Chase?</code> | |
|
* Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters: |
|
```json |
|
{ |
|
"loss": "MultipleNegativesRankingLoss", |
|
"matryoshka_dims": [ |
|
1024, |
|
768, |
|
512, |
|
256, |
|
128, |
|
64 |
|
], |
|
"matryoshka_weights": [ |
|
1, |
|
1, |
|
1, |
|
1, |
|
1, |
|
1 |
|
], |
|
"n_dims_per_step": -1 |
|
} |
|
``` |
|
|
|
### Training Hyperparameters |
|
#### Non-Default Hyperparameters |
|
|
|
- `eval_strategy`: epoch |
|
- `per_device_train_batch_size`: 32 |
|
- `per_device_eval_batch_size`: 16 |
|
- `gradient_accumulation_steps`: 16 |
|
- `learning_rate`: 2e-05 |
|
- `num_train_epochs`: 10 |
|
- `lr_scheduler_type`: cosine |
|
- `warmup_ratio`: 0.1 |
|
- `bf16`: True |
|
- `tf32`: True |
|
- `load_best_model_at_end`: True |
|
- `optim`: adamw_torch_fused |
|
- `batch_sampler`: no_duplicates |
|
|
|
#### All Hyperparameters |
|
<details><summary>Click to expand</summary> |
|
|
|
- `overwrite_output_dir`: False |
|
- `do_predict`: False |
|
- `eval_strategy`: epoch |
|
- `prediction_loss_only`: True |
|
- `per_device_train_batch_size`: 32 |
|
- `per_device_eval_batch_size`: 16 |
|
- `per_gpu_train_batch_size`: None |
|
- `per_gpu_eval_batch_size`: None |
|
- `gradient_accumulation_steps`: 16 |
|
- `eval_accumulation_steps`: None |
|
- `learning_rate`: 2e-05 |
|
- `weight_decay`: 0.0 |
|
- `adam_beta1`: 0.9 |
|
- `adam_beta2`: 0.999 |
|
- `adam_epsilon`: 1e-08 |
|
- `max_grad_norm`: 1.0 |
|
- `num_train_epochs`: 10 |
|
- `max_steps`: -1 |
|
- `lr_scheduler_type`: cosine |
|
- `lr_scheduler_kwargs`: {} |
|
- `warmup_ratio`: 0.1 |
|
- `warmup_steps`: 0 |
|
- `log_level`: passive |
|
- `log_level_replica`: warning |
|
- `log_on_each_node`: True |
|
- `logging_nan_inf_filter`: True |
|
- `save_safetensors`: True |
|
- `save_on_each_node`: False |
|
- `save_only_model`: False |
|
- `restore_callback_states_from_checkpoint`: False |
|
- `no_cuda`: False |
|
- `use_cpu`: False |
|
- `use_mps_device`: False |
|
- `seed`: 42 |
|
- `data_seed`: None |
|
- `jit_mode_eval`: False |
|
- `use_ipex`: False |
|
- `bf16`: True |
|
- `fp16`: False |
|
- `fp16_opt_level`: O1 |
|
- `half_precision_backend`: auto |
|
- `bf16_full_eval`: False |
|
- `fp16_full_eval`: False |
|
- `tf32`: True |
|
- `local_rank`: 0 |
|
- `ddp_backend`: None |
|
- `tpu_num_cores`: None |
|
- `tpu_metrics_debug`: False |
|
- `debug`: [] |
|
- `dataloader_drop_last`: False |
|
- `dataloader_num_workers`: 0 |
|
- `dataloader_prefetch_factor`: None |
|
- `past_index`: -1 |
|
- `disable_tqdm`: False |
|
- `remove_unused_columns`: True |
|
- `label_names`: None |
|
- `load_best_model_at_end`: True |
|
- `ignore_data_skip`: False |
|
- `fsdp`: [] |
|
- `fsdp_min_num_params`: 0 |
|
- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False} |
|
- `fsdp_transformer_layer_cls_to_wrap`: None |
|
- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None} |
|
- `deepspeed`: None |
|
- `label_smoothing_factor`: 0.0 |
|
- `optim`: adamw_torch_fused |
|
- `optim_args`: None |
|
- `adafactor`: False |
|
- `group_by_length`: False |
|
- `length_column_name`: length |
|
- `ddp_find_unused_parameters`: None |
|
- `ddp_bucket_cap_mb`: None |
|
- `ddp_broadcast_buffers`: False |
|
- `dataloader_pin_memory`: True |
|
- `dataloader_persistent_workers`: False |
|
- `skip_memory_metrics`: True |
|
- `use_legacy_prediction_loop`: False |
|
- `push_to_hub`: False |
|
- `resume_from_checkpoint`: None |
|
- `hub_model_id`: None |
|
- `hub_strategy`: every_save |
|
- `hub_private_repo`: False |
|
- `hub_always_push`: False |
|
- `gradient_checkpointing`: False |
|
- `gradient_checkpointing_kwargs`: None |
|
- `include_inputs_for_metrics`: False |
|
- `eval_do_concat_batches`: True |
|
- `fp16_backend`: auto |
|
- `push_to_hub_model_id`: None |
|
- `push_to_hub_organization`: None |
|
- `mp_parameters`: |
|
- `auto_find_batch_size`: False |
|
- `full_determinism`: False |
|
- `torchdynamo`: None |
|
- `ray_scope`: last |
|
- `ddp_timeout`: 1800 |
|
- `torch_compile`: False |
|
- `torch_compile_backend`: None |
|
- `torch_compile_mode`: None |
|
- `dispatch_batches`: None |
|
- `split_batches`: None |
|
- `include_tokens_per_second`: False |
|
- `include_num_input_tokens_seen`: False |
|
- `neftune_noise_alpha`: None |
|
- `optim_target_modules`: None |
|
- `batch_eval_metrics`: False |
|
- `batch_sampler`: no_duplicates |
|
- `multi_dataset_batch_sampler`: proportional |
|
|
|
</details> |
|
|
|
### Training Logs |
|
| Epoch | Step | Training Loss | dim_1024_cosine_map@100 | dim_128_cosine_map@100 | dim_256_cosine_map@100 | dim_512_cosine_map@100 | dim_64_cosine_map@100 | dim_768_cosine_map@100 | |
|
|:-------:|:------:|:-------------:|:-----------------------:|:----------------------:|:----------------------:|:----------------------:|:---------------------:|:----------------------:| |
|
| 0.9552 | 8 | - | 0.9090 | 0.8848 | 0.8992 | 0.9052 | 0.8775 | 0.9030 | |
|
| 1.1940 | 10 | 0.4749 | - | - | - | - | - | - | |
|
| 1.9104 | 16 | - | 0.9170 | 0.9095 | 0.9109 | 0.9201 | 0.8961 | 0.9212 | |
|
| 2.3881 | 20 | 0.0862 | - | - | - | - | - | - | |
|
| 2.9851 | 25 | - | 0.9190 | 0.9071 | 0.9160 | 0.9278 | 0.8998 | 0.9234 | |
|
| 3.5821 | 30 | 0.0315 | - | - | - | - | - | - | |
|
| 3.9403 | 33 | - | 0.9183 | 0.9053 | 0.9122 | 0.9287 | 0.8998 | 0.9183 | |
|
| 4.7761 | 40 | 0.0184 | - | - | - | - | - | - | |
|
| 4.8955 | 41 | - | 0.9225 | 0.9125 | 0.9164 | 0.9260 | 0.8985 | 0.9220 | |
|
| 5.9701 | 50 | 0.0135 | 0.9268 | 0.9132 | 0.9208 | 0.9257 | 0.8961 | 0.9271 | |
|
| 6.9254 | 58 | - | 0.9254 | 0.9158 | 0.9202 | 0.9212 | 0.8938 | 0.9213 | |
|
| 7.1642 | 60 | 0.0123 | - | - | - | - | - | - | |
|
| **8.0** | **67** | **-** | **0.9253** | **0.916** | **0.9228** | **0.9207** | **0.8972** | **0.9243** | |
|
| 8.3582 | 70 | 0.01 | - | - | - | - | - | - | |
|
| 8.9552 | 75 | - | 0.9254 | 0.9160 | 0.9213 | 0.9207 | 0.9005 | 0.9245 | |
|
| 9.5522 | 80 | 0.0088 | 0.9254 | 0.9160 | 0.9228 | 0.9207 | 0.8975 | 0.9245 | |
|
|
|
* The bold row denotes the saved checkpoint. |
|
|
|
### Framework Versions |
|
- Python: 3.10.6 |
|
- Sentence Transformers: 3.0.1 |
|
- Transformers: 4.41.2 |
|
- PyTorch: 2.1.2+cu121 |
|
- Accelerate: 0.32.1 |
|
- Datasets: 2.19.1 |
|
- Tokenizers: 0.19.1 |
|
|
|
## Citation |
|
|
|
### BibTeX |
|
|
|
#### Sentence Transformers |
|
```bibtex |
|
@inproceedings{reimers-2019-sentence-bert, |
|
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", |
|
author = "Reimers, Nils and Gurevych, Iryna", |
|
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing", |
|
month = "11", |
|
year = "2019", |
|
publisher = "Association for Computational Linguistics", |
|
url = "https://arxiv.org/abs/1908.10084", |
|
} |
|
``` |
|
|
|
#### MatryoshkaLoss |
|
```bibtex |
|
@misc{kusupati2024matryoshka, |
|
title={Matryoshka Representation Learning}, |
|
author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi}, |
|
year={2024}, |
|
eprint={2205.13147}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.LG} |
|
} |
|
``` |
|
|
|
#### MultipleNegativesRankingLoss |
|
```bibtex |
|
@misc{henderson2017efficient, |
|
title={Efficient Natural Language Response Suggestion for Smart Reply}, |
|
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil}, |
|
year={2017}, |
|
eprint={1705.00652}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CL} |
|
} |
|
``` |
|
|
|
<!-- |
|
## Glossary |
|
|
|
*Clearly define terms in order to be accessible across audiences.* |
|
--> |
|
|
|
<!-- |
|
## Model Card Authors |
|
|
|
*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.* |
|
--> |
|
|
|
<!-- |
|
## Model Card Contact |
|
|
|
*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.* |
|
--> |