CrossEncoder based on bansalaman18/bert-uncased_L-12_H-512_A-8

This is a Cross Encoder model finetuned from bansalaman18/bert-uncased_L-12_H-512_A-8 on the ms_marco dataset using the sentence-transformers library. It computes scores for pairs of texts, which can be used for text reranking and semantic search.

Model Details

Model Description

Model Sources

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import CrossEncoder

# Download from the 🤗 Hub
model = CrossEncoder("rahulseetharaman/reranker-msmarco-v1.1-bert-uncased_L-12_H-512_A-8-listnet")
# Get scores for pairs of texts
pairs = [
    ['what is lactate dehydrogenase', 'Lactate dehydrogenase (LDH) is an enzyme that helps facilitate the process of turning sugar into energy for your cells to use. LDH is present in many kinds of organs and tissues throughout the body, including the liver, heart, pancreas, kidneys, skeletal muscles, brain, and blood cells. When illness or injury damages your cells, LDH may be released into the bloodstream, causing the level of LDH in your blood to rise.'],
    ['what is lactate dehydrogenase', 'A lactate dehydrogenase (LDH or LD) is an enzyme found in nearly all living cells (animals, plants, and prokaryotes). LDH catalyzes the conversion of pyruvate to lactate and back, as it converts NADH to NAD + and back. A dehydrogenase is an enzyme that transfers a hydride from one molecule to another. LDH exist in four distinct enzyme classes. This article is about the common NAD(P)-dependent L-lactate dehydrogenase. Tissue breakdown releases LDH, and therefore LDH can be measured as a surrogate for tissue breakdown, e.g. hemolysis. LDH is measured by the lactate dehydrogenase (LDH) test (also known as the LDH test or Lactic acid dehydrogenase test).'],
    ['what is lactate dehydrogenase', 'Lactic Acid Dehydrogenase (LDH). Guide. Lactic acid dehydrogenase (LDH) is an enzyme that helps produce energy. It is present in almost all of the tissues in the body and its levels rise in response to cell damage. LDH levels are measured from a sample of blood taken from a vein. '],
    ['what is lactate dehydrogenase', 'Lactate dehydrogenase deficiency is a condition that affects how the body breaks down sugar to use as energy in cells, primarily muscle cells. There are two types of this condition: lactate dehydrogenase-A deficiency (sometimes called glycogen storage disease XI) and lactate dehydrogenase-B deficiency. In some people with lactate dehydrogenase-A deficiency, high-intensity exercise or other strenuous activity leads to the breakdown of muscle tissue (rhabdomyolysis). The destruction of muscle tissue releases a protein called myoglobin, which is processed by the kidneys and released in the urine (myoglobinuria).'],
    ['what is lactate dehydrogenase', 'Summary. The protein encoded by this gene catalyzes the conversion of L-lactate and NAD to pyruvate and NADH in the final step of anaerobic glycolysis. The protein is found predominantly in muscle tissue and belongs to the lactate dehydrogenase family. Mutations in this gene have been linked to exertional myoglobinuria. '],
]
scores = model.predict(pairs)
print(scores.shape)
# (5,)

# Or rank different texts based on similarity to a single text
ranks = model.rank(
    'what is lactate dehydrogenase',
    [
        'Lactate dehydrogenase (LDH) is an enzyme that helps facilitate the process of turning sugar into energy for your cells to use. LDH is present in many kinds of organs and tissues throughout the body, including the liver, heart, pancreas, kidneys, skeletal muscles, brain, and blood cells. When illness or injury damages your cells, LDH may be released into the bloodstream, causing the level of LDH in your blood to rise.',
        'A lactate dehydrogenase (LDH or LD) is an enzyme found in nearly all living cells (animals, plants, and prokaryotes). LDH catalyzes the conversion of pyruvate to lactate and back, as it converts NADH to NAD + and back. A dehydrogenase is an enzyme that transfers a hydride from one molecule to another. LDH exist in four distinct enzyme classes. This article is about the common NAD(P)-dependent L-lactate dehydrogenase. Tissue breakdown releases LDH, and therefore LDH can be measured as a surrogate for tissue breakdown, e.g. hemolysis. LDH is measured by the lactate dehydrogenase (LDH) test (also known as the LDH test or Lactic acid dehydrogenase test).',
        'Lactic Acid Dehydrogenase (LDH). Guide. Lactic acid dehydrogenase (LDH) is an enzyme that helps produce energy. It is present in almost all of the tissues in the body and its levels rise in response to cell damage. LDH levels are measured from a sample of blood taken from a vein. ',
        'Lactate dehydrogenase deficiency is a condition that affects how the body breaks down sugar to use as energy in cells, primarily muscle cells. There are two types of this condition: lactate dehydrogenase-A deficiency (sometimes called glycogen storage disease XI) and lactate dehydrogenase-B deficiency. In some people with lactate dehydrogenase-A deficiency, high-intensity exercise or other strenuous activity leads to the breakdown of muscle tissue (rhabdomyolysis). The destruction of muscle tissue releases a protein called myoglobin, which is processed by the kidneys and released in the urine (myoglobinuria).',
        'Summary. The protein encoded by this gene catalyzes the conversion of L-lactate and NAD to pyruvate and NADH in the final step of anaerobic glycolysis. The protein is found predominantly in muscle tissue and belongs to the lactate dehydrogenase family. Mutations in this gene have been linked to exertional myoglobinuria. ',
    ]
)
# [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]

Evaluation

Metrics

Cross Encoder Reranking

  • Datasets: NanoMSMARCO_R100, NanoNFCorpus_R100 and NanoNQ_R100
  • Evaluated with CrossEncoderRerankingEvaluator with these parameters:
    {
        "at_k": 10,
        "always_rerank_positives": true
    }
    
Metric NanoMSMARCO_R100 NanoNFCorpus_R100 NanoNQ_R100
map 0.0687 (-0.4208) 0.2704 (+0.0094) 0.0560 (-0.3636)
mrr@10 0.0446 (-0.4329) 0.4075 (-0.0923) 0.0367 (-0.3900)
ndcg@10 0.0620 (-0.4785) 0.2659 (-0.0591) 0.0681 (-0.4325)

Cross Encoder Nano BEIR

  • Dataset: NanoBEIR_R100_mean
  • Evaluated with CrossEncoderNanoBEIREvaluator with these parameters:
    {
        "dataset_names": [
            "msmarco",
            "nfcorpus",
            "nq"
        ],
        "rerank_k": 100,
        "at_k": 10,
        "always_rerank_positives": true
    }
    
Metric Value
map 0.1317 (-0.2583)
mrr@10 0.1629 (-0.3051)
ndcg@10 0.1320 (-0.3234)

Training Details

Training Dataset

ms_marco

  • Dataset: ms_marco at a47ee7a
  • Size: 78,704 training samples
  • Columns: query, docs, and labels
  • Approximate statistics based on the first 1000 samples:
    query docs labels
    type string list list
    details
    • min: 11 characters
    • mean: 34.03 characters
    • max: 103 characters
    • min: 4 elements
    • mean: 7.00 elements
    • max: 10 elements
    • min: 4 elements
    • mean: 7.00 elements
    • max: 10 elements
  • Samples:
    query docs labels
    define wear ['Wear is related to interactions between surfaces and specifically the removal and deformation of material on a surface as a result of mechanical action of the opposite surface.', 'n a loss of tooth substance in contact areas through functional wear and friction, resulting in broadening and flattening of the contacts and a decrease in the mesiodistal dimension of the teeth and the dentition as a whole. wear, occlusal, n attritional loss of substance on opposing occlusal units or surfaces.', 'Wear is defined as to have on the body or to reduce the quality of the appearance by constant use. 1 An example of wear is to have on a pair of sunglasses. 2 An example of wear is to wear a hole in the elbow of a jacket.', 'Street Wear. Street wear is defined as west coast skateboarding styles. A lot of street wear companies are based out of the west coast and focus on the styles a classic skateboarder would wear. This includes fitted pants, normally classic vans, screen printed large tees, and ... [1, 0, 0, 0, 0, ...]
    eschooltoday stem cells ['In genetic terms, stem cells are cells in the embryo that are not specialized. After fertilization, there are two types of cells in the embryo. Specialized cells: These are the cells modified with clearly defined instructions or tasks. They are the cells that go on to define set things like taste, hearing, sex and the like. As they divide and grow, they do NOT change into any kind of cell. These are cells in the embryo (just after fertilization), usually obtained from human embryos that are a few days old and are left over from human fertility treatments. These are somewhat ‘generic cells’ and can grow into any of the about 250 cell types in the human body. This type is called Stem Cell.', "Photosynthesis is a chemical process through which plants, some bacteria and algae, produce glucose and oxygen from carbon dioxide and water, using only light as a source of energy. This process is extremely important for life on earth as it provides the oxygen that all other life depend on. Just ... [1, 0, 0, 0, 0, ...]
    does a presidential candidate have to be born in the u s ["Republican U.S. Sen. Ted Cruz, a Tea Party favorite who is widely seen as a potential presidential candidate in the 2016 election, was born in Calgary, Canada. Because his mother was a citizen of the United States, Cruz has maintained he also is a natural born citizen of the United States. You don't have to be born in the United States to be eligible to serve as president of the United States as long as one of more of your parents were American citizens at the time of birth, it is commonly held. The Congressional Research Service concluded in 2011 :", 'His mother was born in Delaware. The family returned to the United States when Cruz was 4. The Constitution gives three eligibility requirements to be president: one must be 35 years of age, a resident within the United States for 14 years, and a natural born Citizen, a term not defined in the Constitution.', "Conventional wisdom holds that candidates for president must be born on U.S. soil to serve in the highest office in the land. T... [1, 0, 0, 0, 0, ...]
  • Loss: ListNetLoss with these parameters:
    {
        "activation_fn": "torch.nn.modules.linear.Identity",
        "mini_batch_size": 16
    }
    

Evaluation Dataset

ms_marco

  • Dataset: ms_marco at a47ee7a
  • Size: 1,000 evaluation samples
  • Columns: query, docs, and labels
  • Approximate statistics based on the first 1000 samples:
    query docs labels
    type string list list
    details
    • min: 11 characters
    • mean: 33.88 characters
    • max: 105 characters
    • min: 2 elements
    • mean: 6.00 elements
    • max: 10 elements
    • min: 2 elements
    • mean: 6.00 elements
    • max: 10 elements
  • Samples:
    query docs labels
    what is lactate dehydrogenase ['Lactate dehydrogenase (LDH) is an enzyme that helps facilitate the process of turning sugar into energy for your cells to use. LDH is present in many kinds of organs and tissues throughout the body, including the liver, heart, pancreas, kidneys, skeletal muscles, brain, and blood cells. When illness or injury damages your cells, LDH may be released into the bloodstream, causing the level of LDH in your blood to rise.', 'A lactate dehydrogenase (LDH or LD) is an enzyme found in nearly all living cells (animals, plants, and prokaryotes). LDH catalyzes the conversion of pyruvate to lactate and back, as it converts NADH to NAD + and back. A dehydrogenase is an enzyme that transfers a hydride from one molecule to another. LDH exist in four distinct enzyme classes. This article is about the common NAD(P)-dependent L-lactate dehydrogenase. Tissue breakdown releases LDH, and therefore LDH can be measured as a surrogate for tissue breakdown, e.g. hemolysis. LDH is measured by the lactate dehy... [1, 0, 0, 0, 0, ...]
    how is platinum produced ['Platinum is found uncombined in alluvial deposits. Most commercially produced platinum comes from South Africa, from the mineral cooperite (platinum sulfide). Some platinum is prepared as a by-product of copper and nickel refining. Platinum is used in the chemicals industry as a catalyst for the production of nitric acid, silicone and benzene. It is also used as a catalyst to improve the efficiency of fuel cells. The electronics industry uses platinum for computer hard disks and thermocouples.', 'Platinum is a chemical element with symbol Pt and atomic number 78. It is a dense, malleable, ductile, highly unreactive, precious, gray-white transition metal. Platinum has six naturally occurring isotopes: 190 Pt, 192 Pt, 194 Pt, 195 Pt, 196 Pt, and 198 Pt. The most abundant of these is 195 Pt, comprising 33.83% of all platinum.', 'Welcome to Platinum Produce. Platinum Produce Company is a hydroponic greenhouse located in Blenheim, Ontario. Platinum Produce was started in 2003 and has grow... [1, 0, 0, 0, 0, ...]
    accounting process definition ['(The Accounting Cycle). The accounting process is a series of activities that begins with a transaction and ends with the closing of the books. Because this process is repeated each reporting period, it is referred to as the accounting cycle and includes these major steps: Identify the transaction or other recognizable event. ', 'Closing Process. The accounting closing process, also called closing the books, is the steps required to prepare accounts for financial statement preparation and the start of the next accounting period. The closing process consists of steps to transfer temporary account balances to permanent accounts and make the general ledger ready for the next accounting period. The closing process consists of three main steps: 1 Identify temporary accounts that need to be close', 'The steps required for individual transactions in the accounting process are: 1 Identify the transaction. 2 First, determine what kind of transaction it may be. 3 Examples are buying goods ... [1, 0, 0, 0, 0, ...]
  • Loss: ListNetLoss with these parameters:
    {
        "activation_fn": "torch.nn.modules.linear.Identity",
        "mini_batch_size": 16
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • learning_rate: 2e-05
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • seed: 12
  • bf16: True
  • load_best_model_at_end: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 12
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss Validation Loss NanoMSMARCO_R100_ndcg@10 NanoNFCorpus_R100_ndcg@10 NanoNQ_R100_ndcg@10 NanoBEIR_R100_mean_ndcg@10
-1 -1 - - 0.0359 (-0.5045) 0.2933 (-0.0318) 0.0371 (-0.4635) 0.1221 (-0.3333)
0.0002 1 2.173 - - - - -
0.0508 250 2.089 - - - - -
0.1016 500 2.0896 2.0896 0.0416 (-0.4988) 0.2979 (-0.0271) 0.0164 (-0.4842) 0.1186 (-0.3367)
0.1525 750 2.0909 - - - - -
0.2033 1000 2.095 2.0888 0.0521 (-0.4883) 0.2484 (-0.0767) 0.0736 (-0.4270) 0.1247 (-0.3307)
0.2541 1250 2.0841 - - - - -
0.3049 1500 2.0862 2.0881 0.0527 (-0.4877) 0.2643 (-0.0607) 0.0590 (-0.4416) 0.1253 (-0.3300)
0.3558 1750 2.0871 - - - - -
0.4066 2000 2.0885 2.0878 0.0547 (-0.4857) 0.2587 (-0.0663) 0.0693 (-0.4314) 0.1276 (-0.3278)
0.4574 2250 2.085 - - - - -
0.5082 2500 2.0898 2.0878 0.0459 (-0.4945) 0.2493 (-0.0757) 0.0521 (-0.4485) 0.1158 (-0.3396)
0.5591 2750 2.0835 - - - - -
0.6099 3000 2.0882 2.0884 0.0648 (-0.4756) 0.2549 (-0.0701) 0.0567 (-0.4440) 0.1255 (-0.3299)
0.6607 3250 2.0868 - - - - -
0.7115 3500 2.0845 2.0872 0.0679 (-0.4725) 0.2479 (-0.0772) 0.0692 (-0.4314) 0.1283 (-0.3270)
0.7624 3750 2.0886 - - - - -
0.8132 4000 2.0827 2.0873 0.0635 (-0.4769) 0.2589 (-0.0661) 0.0699 (-0.4308) 0.1308 (-0.3246)
0.8640 4250 2.0852 - - - - -
0.9148 4500 2.0838 2.0871 0.0620 (-0.4785) 0.2659 (-0.0591) 0.0681 (-0.4325) 0.1320 (-0.3234)
0.9656 4750 2.0831 - - - - -
-1 -1 - - 0.0620 (-0.4785) 0.2659 (-0.0591) 0.0681 (-0.4325) 0.1320 (-0.3234)
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.10.18
  • Sentence Transformers: 5.0.0
  • Transformers: 4.56.0.dev0
  • PyTorch: 2.7.1+cu126
  • Accelerate: 1.9.0
  • Datasets: 4.0.0
  • Tokenizers: 0.21.4

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

ListNetLoss

@inproceedings{cao2007learning,
    title={Learning to Rank: From Pairwise Approach to Listwise Approach},
    author={Cao, Zhe and Qin, Tao and Liu, Tie-Yan and Tsai, Ming-Feng and Li, Hang},
    booktitle={Proceedings of the 24th international conference on Machine learning},
    pages={129--136},
    year={2007}
}
Downloads last month
6
Safetensors
Model size
54M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rahulseetharaman/reranker-msmarco-v1.1-bert-uncased_L-12_H-512_A-8-listnet

Finetuned
(1)
this model

Dataset used to train rahulseetharaman/reranker-msmarco-v1.1-bert-uncased_L-12_H-512_A-8-listnet

Evaluation results