Edit model card

full_bert

This model is a fine-tuned version of on the None dataset. It achieves the following results on the evaluation set:

  • Loss: nan

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 6
  • total_train_batch_size: 96
  • total_eval_batch_size: 96
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 10000
  • num_epochs: 45.0
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
6.778 1.0 55319 6.4618
6.4271 2.0 110638 6.3701
6.3616 3.0 165957 6.3217
6.3257 4.0 221276 6.2966
6.3001 5.0 276595 6.2759
6.2834 6.0 331914 6.2610
6.2699 7.0 387233 6.2465
6.2565 8.0 442552 6.1939
6.2221 9.0 497871 6.1154
6.0721 10.0 553190 5.9524
5.9212 11.0 608509 5.7947
5.8113 12.0 663828 5.7161
5.7509 13.0 719147 5.6614
5.7053 14.0 774466 5.6158
5.6665 15.0 829785 5.5774
5.634 16.0 885104 5.5448
5.6055 17.0 940423 2.7563
3.3308 18.0 995742 2.5443
2.6179 19.0 1051061 2.4196
2.5324 20.0 1106380 2.3393
2.4791 21.0 1161699 2.2755
2.4105 22.0 1217018 2.2241
2.3582 23.0 1272337 2.1772
2.3281 24.0 1327656 2.1416
2.2987 25.0 1382975 2.1137
2.7859 26.0 1438294 2.0950
2.2728 27.0 1493613 2.0685
2.2308 28.0 1548932 2.0499
2.1739 29.0 1604251 2.0082
2.1569 30.0 1659570 1.9939
2.1425 31.0 1714889 1.9802
2.1318 32.0 1770208 1.9669
2.1207 33.0 1825527 1.9583
2.1111 34.0 1880846 1.9477
2.102 35.0 1936165 1.9409
2.0943 36.0 1991484 1.9313
2.0871 37.0 2046803 1.9236
2.0736 38.0 2102122 1.9191
2.0693 39.0 2157441 1.9147
2.0653 40.0 2212760 1.9118
2.0755 41.0 2268079 nan
0.0 42.0 2323398 nan
0.0 43.0 2378717 nan
0.0 44.0 2434036 nan
0.0 45.0 2489355 nan

Framework versions

  • Transformers 4.38.2
  • Pytorch 2.3.0a0+ebedce2
  • Datasets 2.17.1
  • Tokenizers 0.15.2
Downloads last month
1
Safetensors
Model size
126M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.