pszemraj's picture
Update README.md
38e2c9d verified
|
raw
history blame
2.94 kB
metadata
language:
  - en
base_model: pszemraj/MiniLMv2-L6-H384_R-simplewiki
tags:
  - generated_from_trainer
metrics:
  - accuracy
license: apache-2.0
datasets:
  - BEE-spoke-data/fineweb-100k_en-med

MiniLMv2-L6-H384_R-simplewiki-fineweb-100k_en-med_512-vN

This model is a fine-tuned version of pszemraj/MiniLMv2-L6-H384_R-simplewiki on the BEE-spoke-data/fineweb-100k_en-med dataset. It achieves the following results on the evaluation set:

  • Loss: 4.0206
  • Accuracy: 0.3783
  • Num Input Tokens Seen: 162790400

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 1792
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.98) and epsilon=1e-07
  • lr_scheduler_type: inverse_sqrt
  • lr_scheduler_warmup_steps: 100
  • num_epochs: 2.0

Training results

Training Loss Epoch Step Validation Loss Accuracy Input Tokens Seen
4.6583 0.1208 150 4.5052 0.3406 9830400
4.5365 0.2415 300 4.3712 0.3525 19660800
4.4621 0.3623 450 4.2810 0.3575 29491200
4.4116 0.4831 600 4.2466 0.3615 39321600
4.3487 0.6038 750 4.1795 0.3661 49152000
4.338 0.7246 900 4.1874 0.3663 58982400
4.342 0.8454 1050 4.1475 0.3695 68812800
4.268 0.9661 1200 4.1215 0.3714 78643200
4.2185 1.0869 1350 4.1032 0.3725 88472576
4.2645 1.2077 1500 4.0859 0.3757 98302976
4.2542 1.3284 1650 4.0730 0.3750 108133376
4.2614 1.4492 1800 4.0682 0.3749 117963776
4.1928 1.5700 1950 4.0596 0.3758 127794176
4.1971 1.6907 2100 4.0505 0.3777 137624576
4.1966 1.8115 2250 4.0163 0.3787 147454976
4.16 1.9323 2400 4.0352 0.3774 157285376

Framework versions

  • Transformers 4.40.1
  • Pytorch 2.3.0+cu118
  • Datasets 2.19.0
  • Tokenizers 0.19.1