lapp0's picture
Training in progress, step 5000
60891c9 verified
|
raw
history blame
2.72 kB
metadata
library_name: transformers
license: apache-2.0
base_model: HuggingFaceTB/SmolLM-135M
tags:
  - generated_from_trainer
model-index:
  - name: distily_smollm_dataset_sweep
    results: []

distily_smollm_dataset_sweep

This model is a fine-tuned version of HuggingFaceTB/SmolLM-135M on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.2647

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 8
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: polynomial
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1.0

Training results

Training Loss Epoch Step Validation Loss
No log 0 0 18.8388
1.2041 0.0401 5000 1.1584
0.7528 0.0802 10000 0.7396
0.5961 0.1202 15000 0.6070
0.5023 0.1603 20000 0.5307
0.4706 0.2004 25000 0.4836
0.4605 0.2405 30000 0.4512
0.417 0.2806 35000 0.4251
0.4027 0.3206 40000 0.4071
0.3693 0.3607 45000 0.3898
0.3745 0.4008 50000 0.3759
0.3652 0.4409 55000 0.3632
0.3537 0.4810 60000 0.3529
0.3665 0.5210 65000 0.3440
0.3177 0.5611 70000 0.3346
0.3102 0.6012 75000 0.3269
0.3023 0.6413 80000 0.3198
0.3076 0.6814 85000 0.3125
0.3388 0.7214 90000 0.3062
0.298 0.7615 95000 0.3003
0.3052 0.8016 100000 0.2941
0.2678 0.8417 105000 0.2880
0.2684 0.8818 110000 0.2824
0.274 0.9218 115000 0.2764
0.2647 0.9619 120000 0.2706

Framework versions

  • Transformers 4.45.0.dev0
  • Pytorch 2.5.0.dev20240910+cu121
  • Datasets 2.21.0
  • Tokenizers 0.19.1