CocoRoF
/

ModernBERT-SimCSE

@@ -6,7 +6,7 @@ tags:
 - generated_from_trainer
 - dataset_size:392702
 - loss:CosineSimilarityLoss
-base_model: x2bee/KoModernBERT-base-mlm-v03-ckp00
 widget:
 - source_sentence: 우리는 움직이는 동행 우주 정지 좌표계에 비례하여 이동하고 있습니다 ... 약 371km / s에서 별자리 leo
     쪽으로. "
@@ -61,34 +61,34 @@ model-index:
       type: sts_dev
     metrics:
     - type: pearson_cosine
-      value: 0.6463764324668821
       name: Pearson Cosine
     - type: spearman_cosine
-      value: 0.668749120795344
       name: Spearman Cosine
     - type: pearson_euclidean
-      value: 0.6434649881382908
       name: Pearson Euclidean
     - type: spearman_euclidean
-      value: 0.6535107003038169
       name: Spearman Euclidean
     - type: pearson_manhattan
-      value: 0.6516759845194007
       name: Pearson Manhattan
     - type: spearman_manhattan
-      value: 0.6679435004022668
       name: Spearman Manhattan
     - type: pearson_dot
-      value: 0.6306152465572834
       name: Pearson Dot
     - type: spearman_dot
-      value: 0.6496717700503837
       name: Spearman Dot
     - type: pearson_max
-      value: 0.6516759845194007
       name: Pearson Max
     - type: spearman_max
-      value: 0.668749120795344
       name: Spearman Max
 ---
@@ -192,16 +192,16 @@ You can finetune this model on your own dataset.
 | Metric             | Value      |
 |:-------------------|:-----------|
-| pearson_cosine     | 0.6464     |
-| spearman_cosine    | 0.6687     |
-| pearson_euclidean  | 0.6435     |
-| spearman_euclidean | 0.6535     |
-| pearson_manhattan  | 0.6517     |
-| spearman_manhattan | 0.6679     |
-| pearson_dot        | 0.6306     |
-| spearman_dot       | 0.6497     |
-| pearson_max        | 0.6517     |
-| **spearman_max**   | **0.6687** |
 <!--
 ## Bias, Risks and Limitations
@@ -267,237 +267,6 @@ You can finetune this model on your own dataset.
   }
   ```
-### Training Hyperparameters
-#### Non-Default Hyperparameters
-- `overwrite_output_dir`: True
-- `eval_strategy`: steps
-- `per_device_train_batch_size`: 16
-- `per_device_eval_batch_size`: 16
-- `gradient_accumulation_steps`: 8
-- `warmup_ratio`: 0.1
-- `push_to_hub`: True
-- `hub_model_id`: x2bee/sts_nli_tune_test
-- `hub_strategy`: checkpoint
-- `batch_sampler`: no_duplicates
-#### All Hyperparameters
-<details><summary>Click to expand</summary>
-- `overwrite_output_dir`: True
-- `do_predict`: False
-- `eval_strategy`: steps
-- `prediction_loss_only`: True
-- `per_device_train_batch_size`: 16
-- `per_device_eval_batch_size`: 16
-- `per_gpu_train_batch_size`: None
-- `per_gpu_eval_batch_size`: None
-- `gradient_accumulation_steps`: 8
-- `eval_accumulation_steps`: None
-- `torch_empty_cache_steps`: None
-- `learning_rate`: 5e-05
-- `weight_decay`: 0.0
-- `adam_beta1`: 0.9
-- `adam_beta2`: 0.999
-- `adam_epsilon`: 1e-08
-- `max_grad_norm`: 1.0
-- `num_train_epochs`: 3.0
-- `max_steps`: -1
-- `lr_scheduler_type`: linear
-- `lr_scheduler_kwargs`: {}
-- `warmup_ratio`: 0.1
-- `warmup_steps`: 0
-- `log_level`: passive
-- `log_level_replica`: warning
-- `log_on_each_node`: True
-- `logging_nan_inf_filter`: True
-- `save_safetensors`: True
-- `save_on_each_node`: False
-- `save_only_model`: False
-- `restore_callback_states_from_checkpoint`: False
-- `no_cuda`: False
-- `use_cpu`: False
-- `use_mps_device`: False
-- `seed`: 42
-- `data_seed`: None
-- `jit_mode_eval`: False
-- `use_ipex`: False
-- `bf16`: False
-- `fp16`: False
-- `fp16_opt_level`: O1
-- `half_precision_backend`: auto
-- `bf16_full_eval`: False
-- `fp16_full_eval`: False
-- `tf32`: None
-- `local_rank`: 0
-- `ddp_backend`: None
-- `tpu_num_cores`: None
-- `tpu_metrics_debug`: False
-- `debug`: []
-- `dataloader_drop_last`: True
-- `dataloader_num_workers`: 0
-- `dataloader_prefetch_factor`: None
-- `past_index`: -1
-- `disable_tqdm`: False
-- `remove_unused_columns`: True
-- `label_names`: None
-- `load_best_model_at_end`: False
-- `ignore_data_skip`: False
-- `fsdp`: []
-- `fsdp_min_num_params`: 0
-- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
-- `fsdp_transformer_layer_cls_to_wrap`: None
-- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
-- `deepspeed`: None
-- `label_smoothing_factor`: 0.0
-- `optim`: adamw_torch
-- `optim_args`: None
-- `adafactor`: False
-- `group_by_length`: False
-- `length_column_name`: length
-- `ddp_find_unused_parameters`: None
-- `ddp_bucket_cap_mb`: None
-- `ddp_broadcast_buffers`: False
-- `dataloader_pin_memory`: True
-- `dataloader_persistent_workers`: False
-- `skip_memory_metrics`: True
-- `use_legacy_prediction_loop`: False
-- `push_to_hub`: True
-- `resume_from_checkpoint`: None
-- `hub_model_id`: x2bee/sts_nli_tune_test
-- `hub_strategy`: checkpoint
-- `hub_private_repo`: None
-- `hub_always_push`: False
-- `gradient_checkpointing`: False
-- `gradient_checkpointing_kwargs`: None
-- `include_inputs_for_metrics`: False
-- `include_for_metrics`: []
-- `eval_do_concat_batches`: True
-- `fp16_backend`: auto
-- `push_to_hub_model_id`: None
-- `push_to_hub_organization`: None
-- `mp_parameters`:
-- `auto_find_batch_size`: False
-- `full_determinism`: False
-- `torchdynamo`: None
-- `ray_scope`: last
-- `ddp_timeout`: 1800
-- `torch_compile`: False
-- `torch_compile_backend`: None
-- `torch_compile_mode`: None
-- `dispatch_batches`: None
-- `split_batches`: None
-- `include_tokens_per_second`: False
-- `include_num_input_tokens_seen`: False
-- `neftune_noise_alpha`: None
-- `optim_target_modules`: None
-- `batch_eval_metrics`: False
-- `eval_on_start`: False
-- `use_liger_kernel`: False
-- `eval_use_gather_object`: False
-- `average_tokens_across_devices`: False
-- `prompts`: None
-- `batch_sampler`: no_duplicates
-- `multi_dataset_batch_sampler`: proportional
-</details>
-### Training Logs
-| Epoch  | Step | Training Loss | Validation Loss | sts_dev_spearman_max |
-|:------:|:----:|:-------------:|:---------------:|:--------------------:|
-| 0.0326 | 25   | 0.3733        | -               | -                    |
-| 0.0652 | 50   | 0.362         | -               | -                    |
-| 0.0978 | 75   | 0.3543        | -               | -                    |
-| 0.1304 | 100  | 0.3431        | -               | -                    |
-| 0.1630 | 125  | 0.3273        | -               | -                    |
-| 0.1956 | 150  | 0.2745        | -               | -                    |
-| 0.2282 | 175  | 0.2061        | -               | -                    |
-| 0.2608 | 200  | 0.1814        | -               | -                    |
-| 0.2934 | 225  | 0.1658        | -               | -                    |
-| 0.3260 | 250  | 0.1637        | -               | -                    |
-| 0.3586 | 275  | 0.1542        | -               | -                    |
-| 0.3912 | 300  | 0.147         | -               | -                    |
-| 0.4238 | 325  | 0.1392        | -               | -                    |
-| 0.4564 | 350  | 0.1329        | -               | -                    |
-| 0.4890 | 375  | 0.131         | -               | -                    |
-| 0.5216 | 400  | 0.1294        | -               | -                    |
-| 0.5542 | 425  | 0.1245        | -               | -                    |
-| 0.5868 | 450  | 0.1243        | -               | -                    |
-| 0.6194 | 475  | 0.1237        | -               | -                    |
-| 0.6520 | 500  | 0.1236        | 0.0956          | 0.5284               |
-| 0.6846 | 525  | 0.1183        | -               | -                    |
-| 0.7172 | 550  | 0.1166        | -               | -                    |
-| 0.7498 | 575  | 0.1176        | -               | -                    |
-| 0.7824 | 600  | 0.1144        | -               | -                    |
-| 0.8150 | 625  | 0.1141        | -               | -                    |
-| 0.8476 | 650  | 0.1093        | -               | -                    |
-| 0.8802 | 675  | 0.1081        | -               | -                    |
-| 0.9128 | 700  | 0.1082        | -               | -                    |
-| 0.9454 | 725  | 0.1078        | -               | -                    |
-| 0.9780 | 750  | 0.1039        | -               | -                    |
-| 1.0117 | 775  | 0.1106        | -               | -                    |
-| 1.0443 | 800  | 0.1113        | -               | -                    |
-| 1.0769 | 825  | 0.1113        | -               | -                    |
-| 1.1095 | 850  | 0.1103        | -               | -                    |
-| 1.1421 | 875  | 0.1098        | -               | -                    |
-| 1.1747 | 900  | 0.1118        | -               | -                    |
-| 1.2073 | 925  | 0.1085        | -               | -                    |
-| 1.2399 | 950  | 0.1057        | -               | -                    |
-| 1.2725 | 975  | 0.1081        | -               | -                    |
-| 1.3051 | 1000 | 0.1052        | 0.0930          | 0.5830               |
-| 1.3377 | 1025 | 0.1087        | -               | -                    |
-| 1.3703 | 1050 | 0.1046        | -               | -                    |
-| 1.4029 | 1075 | 0.1032        | -               | -                    |
-| 1.4355 | 1100 | 0.1037        | -               | -                    |
-| 1.4681 | 1125 | 0.1026        | -               | -                    |
-| 1.5007 | 1150 | 0.1036        | -               | -                    |
-| 1.5333 | 1175 | 0.102         | -               | -                    |
-| 1.5659 | 1200 | 0.101         | -               | -                    |
-| 1.5985 | 1225 | 0.1014        | -               | -                    |
-| 1.6311 | 1250 | 0.1024        | -               | -                    |
-| 1.6637 | 1275 | 0.1005        | -               | -                    |
-| 1.6963 | 1300 | 0.0993        | -               | -                    |
-| 1.7289 | 1325 | 0.0982        | -               | -                    |
-| 1.7615 | 1350 | 0.0988        | -               | -                    |
-| 1.7941 | 1375 | 0.0965        | -               | -                    |
-| 1.8267 | 1400 | 0.0984        | -               | -                    |
-| 1.8593 | 1425 | 0.0936        | -               | -                    |
-| 1.8919 | 1450 | 0.0924        | -               | -                    |
-| 1.9245 | 1475 | 0.0956        | -               | -                    |
-| 1.9571 | 1500 | 0.0927        | 0.0732          | 0.6470               |
-| 1.9897 | 1525 | 0.0915        | -               | -                    |
-| 2.0235 | 1550 | 0.0991        | -               | -                    |
-| 2.0561 | 1575 | 0.097         | -               | -                    |
-| 2.0887 | 1600 | 0.0957        | -               | -                    |
-| 2.1213 | 1625 | 0.0968        | -               | -                    |
-| 2.1539 | 1650 | 0.0968        | -               | -                    |
-| 2.1865 | 1675 | 0.0973        | -               | -                    |
-| 2.2191 | 1700 | 0.0936        | -               | -                    |
-| 2.2517 | 1725 | 0.0955        | -               | -                    |
-| 2.2843 | 1750 | 0.0942        | -               | -                    |
-| 2.3169 | 1775 | 0.0939        | -               | -                    |
-| 2.3495 | 1800 | 0.0947        | -               | -                    |
-| 2.3821 | 1825 | 0.0934        | -               | -                    |
-| 2.4147 | 1850 | 0.0919        | -               | -                    |
-| 2.4473 | 1875 | 0.0919        | -               | -                    |
-| 2.4799 | 1900 | 0.0928        | -               | -                    |
-| 2.5125 | 1925 | 0.0927        | -               | -                    |
-| 2.5451 | 1950 | 0.0899        | -               | -                    |
-| 2.5777 | 1975 | 0.0911        | -               | -                    |
-| 2.6103 | 2000 | 0.0915        | 0.0671          | 0.6687               |
-| 2.6429 | 2025 | 0.0905        | -               | -                    |
-| 2.6755 | 2050 | 0.0894        | -               | -                    |
-| 2.7081 | 2075 | 0.0887        | -               | -                    |
-| 2.7407 | 2100 | 0.0903        | -               | -                    |
-| 2.7733 | 2125 | 0.0887        | -               | -                    |
-| 2.8059 | 2150 | 0.0869        | -               | -                    |
-| 2.8385 | 2175 | 0.0871        | -               | -                    |
-| 2.8711 | 2200 | 0.0843        | -               | -                    |
-| 2.9037 | 2225 | 0.0838        | -               | -                    |
-| 2.9363 | 2250 | 0.0864        | -               | -                    |
-| 2.9689 | 2275 | 0.0831        | -               | -                    |
 ### Framework Versions
 - Python: 3.11.10
 - Sentence Transformers: 3.3.1

 - generated_from_trainer
 - dataset_size:392702
 - loss:CosineSimilarityLoss
+base_model: answerdotai/ModernBERT-base
 widget:
 - source_sentence: 우리는 움직이는 동행 우주 정지 좌표계에 비례하여 이동하고 있습니다 ... 약 371km / s에서 별자리 leo
     쪽으로. "
       type: sts_dev
     metrics:
     - type: pearson_cosine
+      value: 0.8273878707711191
       name: Pearson Cosine
     - type: spearman_cosine
+      value: 0.8298080691919564
       name: Spearman Cosine
     - type: pearson_euclidean
+      value: 0.8112987734110177
       name: Pearson Euclidean
     - type: spearman_euclidean
+      value: 0.8214596205940881
       name: Spearman Euclidean
     - type: pearson_manhattan
+      value: 0.8125188338482303
       name: Pearson Manhattan
     - type: spearman_manhattan
+      value: 0.8226861322419045
       name: Spearman Manhattan
     - type: pearson_dot
+      value: 0.7646820898603437
       name: Pearson Dot
     - type: spearman_dot
+      value: 0.7648333772102188
       name: Spearman Dot
     - type: pearson_max
+      value: 0.8273878707711191
       name: Pearson Max
     - type: spearman_max
+      value: 0.8298080691919564
       name: Spearman Max
 ---
 | Metric             | Value      |
 |:-------------------|:-----------|
+| pearson_cosine     | 0.8273     |
+| spearman_cosine    | 0.8298     |
+| pearson_euclidean  | 0.8112     |
+| spearman_euclidean | 0.8214     |
+| pearson_manhattan  | 0.8125     |
+| spearman_manhattan | 0.8226     |
+| pearson_dot        | 0.7648     |
+| spearman_dot       | 0.7648     |
+| pearson_max        | 0.8273     |
+| **spearman_max**   | **0.8298** |
 <!--
 ## Bias, Risks and Limitations
   }
   ```
 ### Framework Versions
 - Python: 3.11.10
 - Sentence Transformers: 3.3.1