CrossEncoder based on bansalaman18/bert-uncased_L-10_H-768_A-12

This is a Cross Encoder model finetuned from bansalaman18/bert-uncased_L-10_H-768_A-12 on the ms_marco dataset using the sentence-transformers library. It computes scores for pairs of texts, which can be used for text reranking and semantic search.

Model Details

Model Description

Model Sources

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import CrossEncoder

# Download from the 🤗 Hub
model = CrossEncoder("rahulseetharaman/reranker-msmarco-v1.1-bert-uncased_L-10_H-768_A-12-listnet")
# Get scores for pairs of texts
pairs = [
    ['passport period for minors', 'It is especially important to check the passports of any minors who may be traveling. Passports for minors have a shorter validity period (5 years) than passports for adults (10 years) and thus may expire sooner. The Bureau of Consular Affairs has updated its Schengen Fact Sheet on www.travel.state.gov. '],
    ['passport period for minors', "1 For Minors eligibility to apply for 10 years validity Passport: Effective from 21 st May, 2015 On completion 15 Years of age irrespective of the Passport's place of Issue-in such cases it is mandatory to attach both Parents Passport Copy and Signature on the application. "],
    ['passport period for minors', 'By law, a valid unexpired U.S. passport (or passport card) is conclusive (and not just prima facie) proof of U.S. citizenship, and has the same force and effect as proof of United States citizenship as certificates of naturalization or of citizenship, if issued to a U.S. citizen for the full period allowed by law. American consular officials issued passports to some citizens of some of the thirteen states during the War for Independence (1775–1783). Passports were sheets of paper printed on one side, included a description of the bearer, and were valid for three to six months.'],
    ['passport period for minors', "The validity of a minor's passport is restricted to five years or till they attain the age of 18, whichever is earlier. But the minors aged between 15 to 18 years can apply either for a 10-year validity passport or for a passport which is valid till they attain the age of 18 years. But the minors between 15 to 18 years of age can apply either for a 10-year validity passport or for a passport which is valid till they attain the age of 18 years. Different fees are applicable depending upon the category they are applying for."],
    ['passport period for minors', 'A: Minors between 15 to 18 years of age can apply either for a 10-year validity passport or they can apply for a passport, which is valid till they attain the age of 18 years. Fee for a 10-year validity passport is higher than fee for a passport, which is valid till they attain the age of 18 years. But the minors between 15 to 18 years of age can apply either for a 10-year validity passport or for a passport which is valid till they attain the age of 18 years. Different fees are applicable depending upon the category they are applying for.'],
]
scores = model.predict(pairs)
print(scores.shape)
# (5,)

# Or rank different texts based on similarity to a single text
ranks = model.rank(
    'passport period for minors',
    [
        'It is especially important to check the passports of any minors who may be traveling. Passports for minors have a shorter validity period (5 years) than passports for adults (10 years) and thus may expire sooner. The Bureau of Consular Affairs has updated its Schengen Fact Sheet on www.travel.state.gov. ',
        "1 For Minors eligibility to apply for 10 years validity Passport: Effective from 21 st May, 2015 On completion 15 Years of age irrespective of the Passport's place of Issue-in such cases it is mandatory to attach both Parents Passport Copy and Signature on the application. ",
        'By law, a valid unexpired U.S. passport (or passport card) is conclusive (and not just prima facie) proof of U.S. citizenship, and has the same force and effect as proof of United States citizenship as certificates of naturalization or of citizenship, if issued to a U.S. citizen for the full period allowed by law. American consular officials issued passports to some citizens of some of the thirteen states during the War for Independence (1775–1783). Passports were sheets of paper printed on one side, included a description of the bearer, and were valid for three to six months.',
        "The validity of a minor's passport is restricted to five years or till they attain the age of 18, whichever is earlier. But the minors aged between 15 to 18 years can apply either for a 10-year validity passport or for a passport which is valid till they attain the age of 18 years. But the minors between 15 to 18 years of age can apply either for a 10-year validity passport or for a passport which is valid till they attain the age of 18 years. Different fees are applicable depending upon the category they are applying for.",
        'A: Minors between 15 to 18 years of age can apply either for a 10-year validity passport or they can apply for a passport, which is valid till they attain the age of 18 years. Fee for a 10-year validity passport is higher than fee for a passport, which is valid till they attain the age of 18 years. But the minors between 15 to 18 years of age can apply either for a 10-year validity passport or for a passport which is valid till they attain the age of 18 years. Different fees are applicable depending upon the category they are applying for.',
    ]
)
# [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]

Evaluation

Metrics

Cross Encoder Reranking

  • Datasets: NanoMSMARCO_R100, NanoNFCorpus_R100 and NanoNQ_R100
  • Evaluated with CrossEncoderRerankingEvaluator with these parameters:
    {
        "at_k": 10,
        "always_rerank_positives": true
    }
    
Metric NanoMSMARCO_R100 NanoNFCorpus_R100 NanoNQ_R100
map 0.0653 (-0.4242) 0.2812 (+0.0202) 0.0593 (-0.3603)
mrr@10 0.0404 (-0.4371) 0.3626 (-0.1372) 0.0446 (-0.3821)
ndcg@10 0.0585 (-0.4820) 0.2553 (-0.0697) 0.0517 (-0.4490)

Cross Encoder Nano BEIR

  • Dataset: NanoBEIR_R100_mean
  • Evaluated with CrossEncoderNanoBEIREvaluator with these parameters:
    {
        "dataset_names": [
            "msmarco",
            "nfcorpus",
            "nq"
        ],
        "rerank_k": 100,
        "at_k": 10,
        "always_rerank_positives": true
    }
    
Metric Value
map 0.1353 (-0.2548)
mrr@10 0.1492 (-0.3188)
ndcg@10 0.1218 (-0.3335)

Training Details

Training Dataset

ms_marco

  • Dataset: ms_marco at a47ee7a
  • Size: 78,704 training samples
  • Columns: query, docs, and labels
  • Approximate statistics based on the first 1000 samples:
    query docs labels
    type string list list
    details
    • min: 8 characters
    • mean: 34.09 characters
    • max: 91 characters
    • min: 2 elements
    • mean: 6.00 elements
    • max: 10 elements
    • min: 2 elements
    • mean: 6.00 elements
    • max: 10 elements
  • Samples:
    query docs labels
    what is a geotechnical engineer ['Geotechnical engineering is the branch of civil engineering concerned with the engineering behavior of earth materials. Geotechnical engineering is also related to coastal and ocean engineering. Coastal engineering can involve the design and construction of wharves, marinas, and jetties. Ocean engineering can involve foundation and anchor systems for offshore structures such as oil platforms.', 'A geotechnical engineer’s top priority must be to recognize differences in soil and rock properties, evaluate the engineering properties of the rock and soil on the site, and determine the suitable design and construction method, which is simultaneously cost effective, durable, and safe. Important Aspects of Geotechnical Engineering. Soil Mechanics. Soil mechanics is a major field of the geotechnical engineering in which soil is analyzed prior to any major construction, so as to ensure its suitability to support the load of the desired structures.', 'Geotechnical engineering is a sub-discipli... [1, 0, 0, 0, 0, ...]
    how to make coconut jam drops ["1 Combine sifted flour, coconut and sugar in a large mixing bowl. 2 Mix well. 3 Make a well in the flour mixture and stir in melted butter and milk. 4 Mix well. ( 5 Don't worry if the batter looks a bit crumbly.) Using your hands, shape, rather than roll, level tablespoons of the mixture into balls. 1 Preheat oven to 180C (350F). 2 Combine sifted flour, coconut and sugar in a large mixing bowl. 3 Mix well. 4 Make a well in the flour mixture and stir in melted butter and milk. 5 Mix well. ( 6", 'Method. 1 Beat butter and sugar with an electric mixer until pale and creamy, add egg and vanilla extract and beat until light and fluffy, turn speed to low and add sifted flour, baking powder and 1/2 cup coconut, beat until just combined into a soft dough. 1 2 tablespoons raspberry jam. 2 Beat butter and sugar with an electric mixer until pale and creamy, add egg and vanilla extract and beat until light and fluffy, turn speed to low and add sifted flour, baking powder and 1/2 cup coc... [1, 0, 0, 0, 0, ...]
    are organism species italicized ['Genus and species names are always italicized when printed; the names of other taxa (families, etc.) are not. When a species (or several species of the same genus) is mentioned repeatedly, the genus may be abbreviated after its first mention, as in Q. alba. Binomial Nomenclature. The present system of binomial nomenclature identifies each species by a scientific name of two words, Latin in form and usually derived from Greek or Latin roots. The first name (capitalized) is the genus of the organism, the second (not capitalized) is its species', 'Using italics to name the genus and species is a standard practice in biological nomenclature. It serves to set biological names apart from other parts of a text. Ex. Homo sapiens Also, one must always capitalize the genus and all other taxa (kingdom, phylum, etc.) except for species, which is always lowercase. ', 'The scientific name of a species is formed by the combination of two terms The first name (capitalized) is the genus of the organi... [1, 0, 0, 0, 0, ...]
  • Loss: ListNetLoss with these parameters:
    {
        "activation_fn": "torch.nn.modules.linear.Identity",
        "mini_batch_size": 16
    }
    

Evaluation Dataset

ms_marco

  • Dataset: ms_marco at a47ee7a
  • Size: 1,000 evaluation samples
  • Columns: query, docs, and labels
  • Approximate statistics based on the first 1000 samples:
    query docs labels
    type string list list
    details
    • min: 8 characters
    • mean: 33.85 characters
    • max: 89 characters
    • min: 3 elements
    • mean: 6.50 elements
    • max: 10 elements
    • min: 3 elements
    • mean: 6.50 elements
    • max: 10 elements
  • Samples:
    query docs labels
    passport period for minors ['It is especially important to check the passports of any minors who may be traveling. Passports for minors have a shorter validity period (5 years) than passports for adults (10 years) and thus may expire sooner. The Bureau of Consular Affairs has updated its Schengen Fact Sheet on www.travel.state.gov. ', "1 For Minors eligibility to apply for 10 years validity Passport: Effective from 21 st May, 2015 On completion 15 Years of age irrespective of the Passport's place of Issue-in such cases it is mandatory to attach both Parents Passport Copy and Signature on the application. ", 'By law, a valid unexpired U.S. passport (or passport card) is conclusive (and not just prima facie) proof of U.S. citizenship, and has the same force and effect as proof of United States citizenship as certificates of naturalization or of citizenship, if issued to a U.S. citizen for the full period allowed by law. American consular officials issued passports to some citizens of some of the thirteen states du... [1, 0, 0, 0, 0, ...]
    what is degenerative skull ['Cervical Degenerative Disc Disease. The cervical spine consists of the first seven vertebrae running from the base of the skull to the chest. Sandwiched in between each of these vertebrae is a disc that is made of a gel-like material (the nucleus pulposus) enclosed within a more rigid covering, the annulus fibrosis. These discs act to cushion the vertebrae and absorb shock.', 'Nearly everyone shows some signs of wear and tear on the spinal discs as they age. Not everyone, however, will have symptoms described as degenerative disc disease. Not actually a disease, degenerative disc disease refers to a condition in which pain is caused from a damaged disc. A wide range of symptoms and severity is associated with this condition', 'The most common and obvious symptoms of cervical degenerative disc disease are neck pain and a stiff neck. When one of these conditions presses on one or more of the many nerves running through the spinal cord, you also can develop pain, numbness, or weakness r... [1, 0, 0, 0, 0, ...]
    why is my dell screen green ['When a monitor display shows only a green image, it is usually because the monitor cable is loose. Display signals are outputted in three primary colors: red, green and blue. A monitor may display only green because the red and blue inputs are not connected.', 'Method 2: Also check with the monitor settings. Check with the monitor settings using the monitor menu options using buttons on the Monitor and make sure that the Red, Green and Blue Color settings are proper. You may check the computer or device manual or contact them for more help. Method 3: You may also refer to the following article and check.', 'Right now my whole screen is a lime green color with horizontal lines running across it. I tried to unpluge the monitor but the same thing happened. I left my PC on the whole night and when I turn on the... Source(s): pc monitor screen green additional details inside: https://tr.im/n5ofC.', 'Check if the screen flickers in BIOS. If the screen flickers in BIOS, run the diagnostics on your system: 1 Restart your system and keep tapping “F12” key as soon as the Dell logo appears. 2 Select “Diagnostics” using the arrow keys and press enter. 3 Please check if the screen flickers while running the diagnostics.', 'Power up the computer and LCD and see if the problem goes away. If you still get a green colored screen, if you can, try plugging the screen into a different computer. This will let you find out if the issue is with the screen or the graphics card on the PC.'] [1, 0, 0, 0, 0]
  • Loss: ListNetLoss with these parameters:
    {
        "activation_fn": "torch.nn.modules.linear.Identity",
        "mini_batch_size": 16
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • learning_rate: 2e-05
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • seed: 12
  • bf16: True
  • load_best_model_at_end: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 12
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss Validation Loss NanoMSMARCO_R100_ndcg@10 NanoNFCorpus_R100_ndcg@10 NanoNQ_R100_ndcg@10 NanoBEIR_R100_mean_ndcg@10
-1 -1 - - 0.0413 (-0.4991) 0.2376 (-0.0875) 0.0761 (-0.4245) 0.1183 (-0.3370)
0.0002 1 2.0923 - - - - -
0.0508 250 2.0923 - - - - -
0.1016 500 2.0925 2.0835 0.0304 (-0.5101) 0.2401 (-0.0849) 0.0153 (-0.4853) 0.0953 (-0.3601)
0.1525 750 2.0861 - - - - -
0.2033 1000 2.0886 2.0828 0.0484 (-0.4920) 0.2345 (-0.0905) 0.0284 (-0.4722) 0.1038 (-0.3516)
0.2541 1250 2.0895 - - - - -
0.3049 1500 2.0842 2.0828 0.0359 (-0.5045) 0.2276 (-0.0975) 0.0365 (-0.4641) 0.1000 (-0.3554)
0.3558 1750 2.0858 - - - - -
0.4066 2000 2.0851 2.0820 0.0505 (-0.4899) 0.2312 (-0.0939) 0.0444 (-0.4562) 0.1087 (-0.3467)
0.4574 2250 2.0948 - - - - -
0.5082 2500 2.0844 2.0817 0.0585 (-0.4820) 0.2553 (-0.0697) 0.0517 (-0.4490) 0.1218 (-0.3335)
0.5591 2750 2.0854 - - - - -
0.6099 3000 2.0878 2.0815 0.0502 (-0.4902) 0.2614 (-0.0636) 0.0444 (-0.4563) 0.1187 (-0.3367)
0.6607 3250 2.0854 - - - - -
0.7115 3500 2.0888 2.0813 0.0420 (-0.4984) 0.2600 (-0.0650) 0.0580 (-0.4427) 0.1200 (-0.3354)
0.7624 3750 2.0842 - - - - -
0.8132 4000 2.0838 2.0812 0.0435 (-0.4970) 0.2630 (-0.0620) 0.0504 (-0.4503) 0.1190 (-0.3364)
0.8640 4250 2.0866 - - - - -
0.9148 4500 2.089 2.0811 0.0451 (-0.4954) 0.2598 (-0.0653) 0.0511 (-0.4495) 0.1186 (-0.3367)
0.9656 4750 2.0892 - - - - -
-1 -1 - - 0.0585 (-0.4820) 0.2553 (-0.0697) 0.0517 (-0.4490) 0.1218 (-0.3335)
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.10.18
  • Sentence Transformers: 5.0.0
  • Transformers: 4.56.0.dev0
  • PyTorch: 2.7.1+cu126
  • Accelerate: 1.9.0
  • Datasets: 4.0.0
  • Tokenizers: 0.21.4

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

ListNetLoss

@inproceedings{cao2007learning,
    title={Learning to Rank: From Pairwise Approach to Listwise Approach},
    author={Cao, Zhe and Qin, Tao and Liu, Tie-Yan and Tsai, Ming-Feng and Li, Hang},
    booktitle={Proceedings of the 24th international conference on Machine learning},
    pages={129--136},
    year={2007}
}
Downloads last month
5
Safetensors
Model size
95.3M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rahulseetharaman/reranker-msmarco-v1.1-bert-uncased_L-10_H-768_A-12-listnet

Finetuned
(1)
this model

Dataset used to train rahulseetharaman/reranker-msmarco-v1.1-bert-uncased_L-10_H-768_A-12-listnet

Evaluation results