stella-tuned-rirag / README.md
BelisaDi's picture
Add new SentenceTransformer model.
158ad83 verified
metadata
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:29547
  - loss:MultipleNegativesRankingLoss
base_model: dunzhang/stella_en_400M_v5
widget:
  - source_sentence: >-
      When calculating regulatory capital, which guidance note outlines the
      potential for an increased valuation adjustment for less liquid positions
      that may surpass the adjustments made for financial reporting purposes?
    sentences:
      - >
        REGULATORY REQUIREMENTS - SPOT COMMODITY ACTIVITIES

        Spot Commodities and Accepted Spot Commodities

        Authorised Persons will need to submit the details of how each Accepted
        Spot Commodity that is proposed to be used meets the requirements for
        the purposes of COBS Rule 22.2.2 and paragraphs 25 and 26 above.  The
        use of each Accepted Spot Commodity will be approved as part of the
        formal application process for review and approval of an FSP.  Though an
        Authorised Person may, for example, propose to admit to trading a
        commonly traded Spot Commodity, the Authorised Person’s controls
        relating to responsible and sustainable sourcing, and sound delivery
        mechanisms may not yet be fully developed.  In such circumstances, the
        FSRA may require the Authorised Person to delay the commencement of
        trading until such time that suitable controls have been developed and
        implemented.
      - >+
        Adjustment to the current valuation of less liquid positions for
        regulatory capital purposes. The adjustment to the current valuation of
        less liquid positions made under Guidance note 11 is likely to impact
        minimum Capital Requirements and may exceed those valuation adjustments
        made under the International Financial Reporting Standards and Guidance
        notes 8 and 9.

      - "REGULATORY REQUIREMENTS FOR AUTHORISED PERSONS ENGAGED IN REGULATED ACTIVITIES IN RELATION TO VIRTUAL ASSETS\nAnti-Money Laundering and Countering Financing of Terrorism\nIn order to develop a robust and sustainable regulatory framework for Virtual Assets, FSRA is of the view that a comprehensive application of its AML/CFT framework should be in place, including full compliance with, among other things, the:\n\na)\tUAE AML/CFT Federal Laws, including the UAE Cabinet Resolution No. (10) of 2019 Concerning the Executive Regulation of the Federal Law No. 20 of 2018 concerning Anti-Money Laundering and Combating Terrorism Financing;\n\nb)\tUAE Cabinet Resolution 20 of 2019 concerning the procedures of dealing with those listed under the UN sanctions list and UAE/local terrorist lists issued by the Cabinet, including the FSRA AML and Sanctions Rules and Guidance (“AML Rules”) or such other AML rules as may be applicable in ADGM from time to time; and\n\nc)\tadoption of international best practices (including the FATF Recommendations).\n"
  - source_sentence: >-
      Are there any ADGM-specific guidelines or best practices for integrating
      anti-money laundering (AML) compliance into our technology and financial
      systems to manage operational risks effectively?
    sentences:
      - >
        REGULATORY REQUIREMENTS FOR AUTHORISED PERSONS ENGAGED IN REGULATED
        ACTIVITIES IN RELATION TO VIRTUAL ASSETS

        Security measures and procedures

        IT infrastructures should be strong enough to resist, without
        significant loss to Clients, a number of scenarios, including but not
        limited to: accidental destruction or breach of data, collusion or
        leakage of information by employees/former employees, successful hack of
        a cryptographic and hardware security module or server, or access by
        hackers of any single set of encryption/decryption keys that could
        result in a complete system breach.
      - >-
        A Relevant Person may use a database maintained elsewhere for an
        up-to-date list of resolutions and Sanctions, or to perform checks of
        customers or transactions against that list. For example, it may wish to
        use a database maintained by its head office or a Group member. However,
        the Relevant Person retains responsibility for ensuring that its systems
        and controls are effective to ensure compliance with this Rulebook.
      - >
        DIGITAL SECURITIES SETTLEMENT

        Digital Settlement Facilities (DSFs)

        For the purposes of this Guidance and distinct from RCHs, the FSRA will
        consider DSFs suitable for the purposes of settlement (MIR Rule 3.8) and
        custody (MIR Rule 2.10) of Digital Securities. A DSF, holding an FSP for
        Providing Custody, may provide custody and settlement services in
        Digital Securities for RIEs and MTFs (as applicable).  Therefore, for
        the purposes of custody and settlement of Digital Securities, the
        arrangements that a RIE or MTF would normally have in place with a RCH
        can be replaced with arrangements provided by a DSF, provided that
        certain requirements, as described in this section, are met.
  - source_sentence: >-
      In the context of the Risk-Based Approach (RBA), how should a Relevant
      Person prioritize and address the risks once they have been identified and
      assessed?
    sentences:
      - >-
        If the Regulator considers that an auditor or actuary has committed a
        contravention of these Regulations, it may disqualify the auditor or
        actuary from being the auditor of, or (as the case may be), from acting
        as an actuary for, any Authorised Person, Recognised Body or Reporting
        Entity or any particular class thereof.
      - >-
        The Regulator shall have the power to require an Institution in
        Resolution, or any of its Group Entities, to provide any services or
        facilities (excluding any financial support) that are necessary to
        enable the Recipient to operate the transferred business effectively,
        including where the Institution under Resolution or relevant Group
        Entity has entered into Insolvency Proceedings.
      - >-
        In addition to assessing risk arising from money laundering, a business
        risk assessment should assess the potential exposure of a Relevant
        Person to other Financial Crime, such as fraud and the theft of personal
        data. The business risk assessment should also address the Relevant
        Person’s potential exposure to cyber security risk, as this risk may
        have a material impact on the Relevant Person’s ability to prevent
        Financial Crime.
  - source_sentence: >-
      Can you provide further clarification on the specific measures deemed
      adequate for handling conflicts of interest related to the provision and
      management of credit within an Authorised Person's organization?
    sentences:
      - >-
        An Authorised Person with one or more branches outside the ADGM must
        implement and maintain Credit Risk policies adapted to each local market
        and its regulatory conditions.
      - "In addition, applications for recognition as a Remote Investment Exchange or Remote Clearing House must contain:\n(a)\tthe address of the Applicant's head office in its home jurisdiction;\n(b)\tthe address of a place in the Abu Dhabi Global Market for the service on the Applicant of notices or other documents required or authorised to be served on it;\n(c)\tinformation identifying any type of activity which the Applicant envisages undertaking in the Abu Dhabi Global Market and the extent and nature of usage and membership;\n(d)\ta comparative analysis of the Applicant's regulatory requirements in its home jurisdiction compared against those under the Rules set out in this Rulebook and those contained in the “Principles for Financial Market Infrastructures” issued by IOSCO and the Committee on Payment and Settlement Systems (April 2012);\n(e)\tthe information, evidence and explanatory material necessary to demonstrate to the Regulator that the requirements specified in Rule ‎7.2.2 are met;\n(f)\tone copy of each of the following documents:\n(i)\tits most recent financial statements; and\n(ii)\tthe Applicant’s memorandum and articles of association or any similar documents; and\n(g)\tthe date by which the Applicant wishes the Recognition Order to take effect."
      - >-
        Financial risk . All applicants are required to demonstrate they have a
        sound initial capital base and funding and must be able to meet the
        relevant prudential requirements of ADGM legislation, on an ongoing
        basis. This includes holding enough capital resources to cover expenses
        even if expected revenue takes time to materialise. Start-ups can
        encounter greater financial risks as they seek to establish and grow a
        new business.
  - source_sentence: >-
      What are the recommended best practices for ensuring that all disclosures
      are prepared in accordance with the PRMS, and how can we validate that our
      classification and reporting of Petroleum Resources meet the standards set
      forth?
    sentences:
      - >-
        Notwithstanding this Rule, an Authorised Person would generally be
        expected to separate the roles of Compliance Officer and Senior
        Executive Officer. In addition, the roles of Compliance Officer, Finance
        Officer and Money Laundering Reporting Officer would not be expected to
        be combined with any other Controlled Functions unless appropriate
        monitoring and control arrangements independent of the individual
        concerned will be implemented by the Authorised Person. This may be
        possible in the case of a Branch, where monitoring and controlling of
        the individual (carrying out more than one role in the Branch) is
        conducted from the Authorised Person's home state by an appropriate
        individual for each of the relevant Controlled Functions as applicable.
        However, it is recognised that, on a case by case basis, there may be
        exceptional circumstances in which this may not always be practical or
        possible.
      - >
        DISCLOSURE REQUIREMENTS .

        Material Exploration and drilling results

        Rule 12.5.1 sets out the reporting requirements relevant to disclosures
        of material Exploration and drilling results in relation to Petroleum
        Resources.  Such disclosures should be presented in a factual and
        balanced manner, and contain sufficient information to allow investors
        and their advisers to make an informed judgement of its materiality. 
        Care needs to be taken to ensure that a disclosure does not suggest,
        without reasonable grounds, that commercially recoverable or potentially
        recoverable quantities of Petroleum have been discovered, in the absence
        of determining and disclosing estimates of Petroleum Resources in
        accordance with Chapter 12 and the PRMS.
      - >
        REGULATORY REQUIREMENTS FOR AUTHORISED PERSONS ENGAGED IN REGULATED
        ACTIVITIES IN RELATION TO VIRTUAL ASSETS

        Origin and destination of Virtual Asset funds

        Currently, there are technology solutions developed in-house and
        available from third party service providers which enable the tracking
        of Virtual Assets through multiple transactions to more accurately
        identify the source and destination of these Virtual Assets. It is
        expected that Authorised Persons may need to consider the use of such
        solutions and other systems to adequately meet their anti-money
        laundering, financial crime and know-your-customer obligations under the
        Virtual Asset Framework.
pipeline_tag: sentence-similarity
library_name: sentence-transformers

SentenceTransformer based on dunzhang/stella_en_400M_v5

This is a sentence-transformers model finetuned from dunzhang/stella_en_400M_v5. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: dunzhang/stella_en_400M_v5
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 1024 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: NewModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Dense({'in_features': 1024, 'out_features': 1024, 'bias': True, 'activation_function': 'torch.nn.modules.linear.Identity'})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("BelisaDi/stella-tuned-rirag")
# Run inference
sentences = [
    'What are the recommended best practices for ensuring that all disclosures are prepared in accordance with the PRMS, and how can we validate that our classification and reporting of Petroleum Resources meet the standards set forth?',
    'DISCLOSURE REQUIREMENTS .\nMaterial Exploration and drilling results\nRule 12.5.1 sets out the reporting requirements relevant to disclosures of material Exploration and drilling results in relation to Petroleum Resources.  Such disclosures should be presented in a factual and balanced manner, and contain sufficient information to allow investors and their advisers to make an informed judgement of its materiality.  Care needs to be taken to ensure that a disclosure does not suggest, without reasonable grounds, that commercially recoverable or potentially recoverable quantities of Petroleum have been discovered, in the absence of determining and disclosing estimates of Petroleum Resources in accordance with Chapter 12 and the PRMS.\n',
    "Notwithstanding this Rule, an Authorised Person would generally be expected to separate the roles of Compliance Officer and Senior Executive Officer. In addition, the roles of Compliance Officer, Finance Officer and Money Laundering Reporting Officer would not be expected to be combined with any other Controlled Functions unless appropriate monitoring and control arrangements independent of the individual concerned will be implemented by the Authorised Person. This may be possible in the case of a Branch, where monitoring and controlling of the individual (carrying out more than one role in the Branch) is conducted from the Authorised Person's home state by an appropriate individual for each of the relevant Controlled Functions as applicable. However, it is recognised that, on a case by case basis, there may be exceptional circumstances in which this may not always be practical or possible.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 29,547 training samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 15 tokens
    • mean: 34.89 tokens
    • max: 96 tokens
    • min: 14 tokens
    • mean: 115.67 tokens
    • max: 512 tokens
  • Samples:
    anchor positive
    Under Rules 7.3.2 and 7.3.3, what are the two specific conditions related to the maturity of a financial instrument that would trigger a disclosure requirement? Events that trigger a disclosure. For the purposes of Rules 7.3.2 and 7.3.3, a Person is taken to hold Financial Instruments in or relating to a Reporting Entity, if the Person holds a Financial Instrument that on its maturity will confer on him:
    (1) an unconditional right to acquire the Financial Instrument; or
    (2) the discretion as to his right to acquire the Financial Instrument.
    Best Execution and Transaction Handling: What constitutes 'Best Execution' under Rule 6.5 in the context of virtual assets, and how should Authorised Persons document and demonstrate this? The following COBS Rules should be read as applying to all Transactions undertaken by an Authorised Person conducting a Regulated Activity in relation to Virtual Assets, irrespective of any restrictions on application or any exception to these Rules elsewhere in COBS -
    (a) Rule 3.4 (Suitability);
    (b) Rule 6.5 (Best Execution);
    (c) Rule 6.7 (Aggregation and Allocation);
    (d) Rule 6.10 (Confirmation Notes);
    (e) Rule 6.11 (Periodic Statements); and
    (f) Chapter 12 (Key Information and Client Agreement).
    How does the FSRA define and evaluate "principal risks and uncertainties" for a Petroleum Reporting Entity, particularly for the remaining six months of the financial year? A Reporting Entity must:
    (a) prepare such report:
    (i) for the first six months of each financial year or period, and if there is a change to the accounting reference date, prepare such report in respect of the period up to the old accounting reference date; and
    (ii) in accordance with the applicable IFRS standards or other standards acceptable to the Regulator;
    (b) ensure the financial statements have either been audited or reviewed by auditors, and the audit or review by the auditor is included within the report; and
    (c) ensure that the report includes:
    (i) except in the case of a Mining Exploration Reporting Entity or a Petroleum Exploration Reporting Entity, an indication of important events that have occurred during the first six months of the financial year, and their impact on the financial statements;
    (ii) except in the case of a Mining Exploration Reporting Entity or a Petroleum Exploration Reporting Entity, a description of the principal risks and uncertainties for the remaining six months of the financial year; and
    (iii) a condensed set of financial statements, an interim management report and associated responsibility statements.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • learning_rate: 2e-05
  • auto_find_batch_size: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 8
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 3
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: True
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss
0.1354 500 0.3078
0.2707 1000 0.3142
0.4061 1500 0.2546
0.5414 2000 0.2574
0.6768 2500 0.247
0.8121 3000 0.2532
0.9475 3500 0.2321
1.0828 4000 0.1794
1.2182 4500 0.1588
1.3535 5000 0.154
1.4889 5500 0.1592
1.6243 6000 0.1632
1.7596 6500 0.1471
1.8950 7000 0.1669
2.0303 7500 0.1368
2.1657 8000 0.0982
2.3010 8500 0.1125
2.4364 9000 0.089
2.5717 9500 0.0902
2.7071 10000 0.0867
2.8424 10500 0.1017
2.9778 11000 0.0835

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.1.1
  • Transformers: 4.45.2
  • PyTorch: 2.5.0+cu124
  • Accelerate: 1.0.1
  • Datasets: 3.0.2
  • Tokenizers: 0.20.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}