CocoRoF commited on
Commit
b35c11a
·
verified ·
1 Parent(s): cb2afd5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -252
README.md CHANGED
@@ -6,7 +6,7 @@ tags:
6
  - generated_from_trainer
7
  - dataset_size:392702
8
  - loss:CosineSimilarityLoss
9
- base_model: x2bee/KoModernBERT-base-mlm-v03-ckp00
10
  widget:
11
  - source_sentence: 우리는 움직이는 동행 우주 정지 좌표계에 비례하여 이동하고 있습니다 ... 약 371km / s에서 별자리 leo
12
  쪽으로. "
@@ -61,34 +61,34 @@ model-index:
61
  type: sts_dev
62
  metrics:
63
  - type: pearson_cosine
64
- value: 0.6463764324668821
65
  name: Pearson Cosine
66
  - type: spearman_cosine
67
- value: 0.668749120795344
68
  name: Spearman Cosine
69
  - type: pearson_euclidean
70
- value: 0.6434649881382908
71
  name: Pearson Euclidean
72
  - type: spearman_euclidean
73
- value: 0.6535107003038169
74
  name: Spearman Euclidean
75
  - type: pearson_manhattan
76
- value: 0.6516759845194007
77
  name: Pearson Manhattan
78
  - type: spearman_manhattan
79
- value: 0.6679435004022668
80
  name: Spearman Manhattan
81
  - type: pearson_dot
82
- value: 0.6306152465572834
83
  name: Pearson Dot
84
  - type: spearman_dot
85
- value: 0.6496717700503837
86
  name: Spearman Dot
87
  - type: pearson_max
88
- value: 0.6516759845194007
89
  name: Pearson Max
90
  - type: spearman_max
91
- value: 0.668749120795344
92
  name: Spearman Max
93
  ---
94
 
@@ -192,16 +192,16 @@ You can finetune this model on your own dataset.
192
 
193
  | Metric | Value |
194
  |:-------------------|:-----------|
195
- | pearson_cosine | 0.6464 |
196
- | spearman_cosine | 0.6687 |
197
- | pearson_euclidean | 0.6435 |
198
- | spearman_euclidean | 0.6535 |
199
- | pearson_manhattan | 0.6517 |
200
- | spearman_manhattan | 0.6679 |
201
- | pearson_dot | 0.6306 |
202
- | spearman_dot | 0.6497 |
203
- | pearson_max | 0.6517 |
204
- | **spearman_max** | **0.6687** |
205
 
206
  <!--
207
  ## Bias, Risks and Limitations
@@ -267,237 +267,6 @@ You can finetune this model on your own dataset.
267
  }
268
  ```
269
 
270
- ### Training Hyperparameters
271
- #### Non-Default Hyperparameters
272
-
273
- - `overwrite_output_dir`: True
274
- - `eval_strategy`: steps
275
- - `per_device_train_batch_size`: 16
276
- - `per_device_eval_batch_size`: 16
277
- - `gradient_accumulation_steps`: 8
278
- - `warmup_ratio`: 0.1
279
- - `push_to_hub`: True
280
- - `hub_model_id`: x2bee/sts_nli_tune_test
281
- - `hub_strategy`: checkpoint
282
- - `batch_sampler`: no_duplicates
283
-
284
- #### All Hyperparameters
285
- <details><summary>Click to expand</summary>
286
-
287
- - `overwrite_output_dir`: True
288
- - `do_predict`: False
289
- - `eval_strategy`: steps
290
- - `prediction_loss_only`: True
291
- - `per_device_train_batch_size`: 16
292
- - `per_device_eval_batch_size`: 16
293
- - `per_gpu_train_batch_size`: None
294
- - `per_gpu_eval_batch_size`: None
295
- - `gradient_accumulation_steps`: 8
296
- - `eval_accumulation_steps`: None
297
- - `torch_empty_cache_steps`: None
298
- - `learning_rate`: 5e-05
299
- - `weight_decay`: 0.0
300
- - `adam_beta1`: 0.9
301
- - `adam_beta2`: 0.999
302
- - `adam_epsilon`: 1e-08
303
- - `max_grad_norm`: 1.0
304
- - `num_train_epochs`: 3.0
305
- - `max_steps`: -1
306
- - `lr_scheduler_type`: linear
307
- - `lr_scheduler_kwargs`: {}
308
- - `warmup_ratio`: 0.1
309
- - `warmup_steps`: 0
310
- - `log_level`: passive
311
- - `log_level_replica`: warning
312
- - `log_on_each_node`: True
313
- - `logging_nan_inf_filter`: True
314
- - `save_safetensors`: True
315
- - `save_on_each_node`: False
316
- - `save_only_model`: False
317
- - `restore_callback_states_from_checkpoint`: False
318
- - `no_cuda`: False
319
- - `use_cpu`: False
320
- - `use_mps_device`: False
321
- - `seed`: 42
322
- - `data_seed`: None
323
- - `jit_mode_eval`: False
324
- - `use_ipex`: False
325
- - `bf16`: False
326
- - `fp16`: False
327
- - `fp16_opt_level`: O1
328
- - `half_precision_backend`: auto
329
- - `bf16_full_eval`: False
330
- - `fp16_full_eval`: False
331
- - `tf32`: None
332
- - `local_rank`: 0
333
- - `ddp_backend`: None
334
- - `tpu_num_cores`: None
335
- - `tpu_metrics_debug`: False
336
- - `debug`: []
337
- - `dataloader_drop_last`: True
338
- - `dataloader_num_workers`: 0
339
- - `dataloader_prefetch_factor`: None
340
- - `past_index`: -1
341
- - `disable_tqdm`: False
342
- - `remove_unused_columns`: True
343
- - `label_names`: None
344
- - `load_best_model_at_end`: False
345
- - `ignore_data_skip`: False
346
- - `fsdp`: []
347
- - `fsdp_min_num_params`: 0
348
- - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
349
- - `fsdp_transformer_layer_cls_to_wrap`: None
350
- - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
351
- - `deepspeed`: None
352
- - `label_smoothing_factor`: 0.0
353
- - `optim`: adamw_torch
354
- - `optim_args`: None
355
- - `adafactor`: False
356
- - `group_by_length`: False
357
- - `length_column_name`: length
358
- - `ddp_find_unused_parameters`: None
359
- - `ddp_bucket_cap_mb`: None
360
- - `ddp_broadcast_buffers`: False
361
- - `dataloader_pin_memory`: True
362
- - `dataloader_persistent_workers`: False
363
- - `skip_memory_metrics`: True
364
- - `use_legacy_prediction_loop`: False
365
- - `push_to_hub`: True
366
- - `resume_from_checkpoint`: None
367
- - `hub_model_id`: x2bee/sts_nli_tune_test
368
- - `hub_strategy`: checkpoint
369
- - `hub_private_repo`: None
370
- - `hub_always_push`: False
371
- - `gradient_checkpointing`: False
372
- - `gradient_checkpointing_kwargs`: None
373
- - `include_inputs_for_metrics`: False
374
- - `include_for_metrics`: []
375
- - `eval_do_concat_batches`: True
376
- - `fp16_backend`: auto
377
- - `push_to_hub_model_id`: None
378
- - `push_to_hub_organization`: None
379
- - `mp_parameters`:
380
- - `auto_find_batch_size`: False
381
- - `full_determinism`: False
382
- - `torchdynamo`: None
383
- - `ray_scope`: last
384
- - `ddp_timeout`: 1800
385
- - `torch_compile`: False
386
- - `torch_compile_backend`: None
387
- - `torch_compile_mode`: None
388
- - `dispatch_batches`: None
389
- - `split_batches`: None
390
- - `include_tokens_per_second`: False
391
- - `include_num_input_tokens_seen`: False
392
- - `neftune_noise_alpha`: None
393
- - `optim_target_modules`: None
394
- - `batch_eval_metrics`: False
395
- - `eval_on_start`: False
396
- - `use_liger_kernel`: False
397
- - `eval_use_gather_object`: False
398
- - `average_tokens_across_devices`: False
399
- - `prompts`: None
400
- - `batch_sampler`: no_duplicates
401
- - `multi_dataset_batch_sampler`: proportional
402
-
403
- </details>
404
-
405
- ### Training Logs
406
- | Epoch | Step | Training Loss | Validation Loss | sts_dev_spearman_max |
407
- |:------:|:----:|:-------------:|:---------------:|:--------------------:|
408
- | 0.0326 | 25 | 0.3733 | - | - |
409
- | 0.0652 | 50 | 0.362 | - | - |
410
- | 0.0978 | 75 | 0.3543 | - | - |
411
- | 0.1304 | 100 | 0.3431 | - | - |
412
- | 0.1630 | 125 | 0.3273 | - | - |
413
- | 0.1956 | 150 | 0.2745 | - | - |
414
- | 0.2282 | 175 | 0.2061 | - | - |
415
- | 0.2608 | 200 | 0.1814 | - | - |
416
- | 0.2934 | 225 | 0.1658 | - | - |
417
- | 0.3260 | 250 | 0.1637 | - | - |
418
- | 0.3586 | 275 | 0.1542 | - | - |
419
- | 0.3912 | 300 | 0.147 | - | - |
420
- | 0.4238 | 325 | 0.1392 | - | - |
421
- | 0.4564 | 350 | 0.1329 | - | - |
422
- | 0.4890 | 375 | 0.131 | - | - |
423
- | 0.5216 | 400 | 0.1294 | - | - |
424
- | 0.5542 | 425 | 0.1245 | - | - |
425
- | 0.5868 | 450 | 0.1243 | - | - |
426
- | 0.6194 | 475 | 0.1237 | - | - |
427
- | 0.6520 | 500 | 0.1236 | 0.0956 | 0.5284 |
428
- | 0.6846 | 525 | 0.1183 | - | - |
429
- | 0.7172 | 550 | 0.1166 | - | - |
430
- | 0.7498 | 575 | 0.1176 | - | - |
431
- | 0.7824 | 600 | 0.1144 | - | - |
432
- | 0.8150 | 625 | 0.1141 | - | - |
433
- | 0.8476 | 650 | 0.1093 | - | - |
434
- | 0.8802 | 675 | 0.1081 | - | - |
435
- | 0.9128 | 700 | 0.1082 | - | - |
436
- | 0.9454 | 725 | 0.1078 | - | - |
437
- | 0.9780 | 750 | 0.1039 | - | - |
438
- | 1.0117 | 775 | 0.1106 | - | - |
439
- | 1.0443 | 800 | 0.1113 | - | - |
440
- | 1.0769 | 825 | 0.1113 | - | - |
441
- | 1.1095 | 850 | 0.1103 | - | - |
442
- | 1.1421 | 875 | 0.1098 | - | - |
443
- | 1.1747 | 900 | 0.1118 | - | - |
444
- | 1.2073 | 925 | 0.1085 | - | - |
445
- | 1.2399 | 950 | 0.1057 | - | - |
446
- | 1.2725 | 975 | 0.1081 | - | - |
447
- | 1.3051 | 1000 | 0.1052 | 0.0930 | 0.5830 |
448
- | 1.3377 | 1025 | 0.1087 | - | - |
449
- | 1.3703 | 1050 | 0.1046 | - | - |
450
- | 1.4029 | 1075 | 0.1032 | - | - |
451
- | 1.4355 | 1100 | 0.1037 | - | - |
452
- | 1.4681 | 1125 | 0.1026 | - | - |
453
- | 1.5007 | 1150 | 0.1036 | - | - |
454
- | 1.5333 | 1175 | 0.102 | - | - |
455
- | 1.5659 | 1200 | 0.101 | - | - |
456
- | 1.5985 | 1225 | 0.1014 | - | - |
457
- | 1.6311 | 1250 | 0.1024 | - | - |
458
- | 1.6637 | 1275 | 0.1005 | - | - |
459
- | 1.6963 | 1300 | 0.0993 | - | - |
460
- | 1.7289 | 1325 | 0.0982 | - | - |
461
- | 1.7615 | 1350 | 0.0988 | - | - |
462
- | 1.7941 | 1375 | 0.0965 | - | - |
463
- | 1.8267 | 1400 | 0.0984 | - | - |
464
- | 1.8593 | 1425 | 0.0936 | - | - |
465
- | 1.8919 | 1450 | 0.0924 | - | - |
466
- | 1.9245 | 1475 | 0.0956 | - | - |
467
- | 1.9571 | 1500 | 0.0927 | 0.0732 | 0.6470 |
468
- | 1.9897 | 1525 | 0.0915 | - | - |
469
- | 2.0235 | 1550 | 0.0991 | - | - |
470
- | 2.0561 | 1575 | 0.097 | - | - |
471
- | 2.0887 | 1600 | 0.0957 | - | - |
472
- | 2.1213 | 1625 | 0.0968 | - | - |
473
- | 2.1539 | 1650 | 0.0968 | - | - |
474
- | 2.1865 | 1675 | 0.0973 | - | - |
475
- | 2.2191 | 1700 | 0.0936 | - | - |
476
- | 2.2517 | 1725 | 0.0955 | - | - |
477
- | 2.2843 | 1750 | 0.0942 | - | - |
478
- | 2.3169 | 1775 | 0.0939 | - | - |
479
- | 2.3495 | 1800 | 0.0947 | - | - |
480
- | 2.3821 | 1825 | 0.0934 | - | - |
481
- | 2.4147 | 1850 | 0.0919 | - | - |
482
- | 2.4473 | 1875 | 0.0919 | - | - |
483
- | 2.4799 | 1900 | 0.0928 | - | - |
484
- | 2.5125 | 1925 | 0.0927 | - | - |
485
- | 2.5451 | 1950 | 0.0899 | - | - |
486
- | 2.5777 | 1975 | 0.0911 | - | - |
487
- | 2.6103 | 2000 | 0.0915 | 0.0671 | 0.6687 |
488
- | 2.6429 | 2025 | 0.0905 | - | - |
489
- | 2.6755 | 2050 | 0.0894 | - | - |
490
- | 2.7081 | 2075 | 0.0887 | - | - |
491
- | 2.7407 | 2100 | 0.0903 | - | - |
492
- | 2.7733 | 2125 | 0.0887 | - | - |
493
- | 2.8059 | 2150 | 0.0869 | - | - |
494
- | 2.8385 | 2175 | 0.0871 | - | - |
495
- | 2.8711 | 2200 | 0.0843 | - | - |
496
- | 2.9037 | 2225 | 0.0838 | - | - |
497
- | 2.9363 | 2250 | 0.0864 | - | - |
498
- | 2.9689 | 2275 | 0.0831 | - | - |
499
-
500
-
501
  ### Framework Versions
502
  - Python: 3.11.10
503
  - Sentence Transformers: 3.3.1
 
6
  - generated_from_trainer
7
  - dataset_size:392702
8
  - loss:CosineSimilarityLoss
9
+ base_model: answerdotai/ModernBERT-base
10
  widget:
11
  - source_sentence: 우리는 움직이는 동행 우주 정지 좌표계에 비례하여 이동하고 있습니다 ... 약 371km / s에서 별자리 leo
12
  쪽으로. "
 
61
  type: sts_dev
62
  metrics:
63
  - type: pearson_cosine
64
+ value: 0.8273878707711191
65
  name: Pearson Cosine
66
  - type: spearman_cosine
67
+ value: 0.8298080691919564
68
  name: Spearman Cosine
69
  - type: pearson_euclidean
70
+ value: 0.8112987734110177
71
  name: Pearson Euclidean
72
  - type: spearman_euclidean
73
+ value: 0.8214596205940881
74
  name: Spearman Euclidean
75
  - type: pearson_manhattan
76
+ value: 0.8125188338482303
77
  name: Pearson Manhattan
78
  - type: spearman_manhattan
79
+ value: 0.8226861322419045
80
  name: Spearman Manhattan
81
  - type: pearson_dot
82
+ value: 0.7646820898603437
83
  name: Pearson Dot
84
  - type: spearman_dot
85
+ value: 0.7648333772102188
86
  name: Spearman Dot
87
  - type: pearson_max
88
+ value: 0.8273878707711191
89
  name: Pearson Max
90
  - type: spearman_max
91
+ value: 0.8298080691919564
92
  name: Spearman Max
93
  ---
94
 
 
192
 
193
  | Metric | Value |
194
  |:-------------------|:-----------|
195
+ | pearson_cosine | 0.8273 |
196
+ | spearman_cosine | 0.8298 |
197
+ | pearson_euclidean | 0.8112 |
198
+ | spearman_euclidean | 0.8214 |
199
+ | pearson_manhattan | 0.8125 |
200
+ | spearman_manhattan | 0.8226 |
201
+ | pearson_dot | 0.7648 |
202
+ | spearman_dot | 0.7648 |
203
+ | pearson_max | 0.8273 |
204
+ | **spearman_max** | **0.8298** |
205
 
206
  <!--
207
  ## Bias, Risks and Limitations
 
267
  }
268
  ```
269
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
270
  ### Framework Versions
271
  - Python: 3.11.10
272
  - Sentence Transformers: 3.3.1