lapp0 commited on
Commit
7e65596
1 Parent(s): c1ab5d1

End of training

Browse files
README.md CHANGED
@@ -15,14 +15,14 @@ This student model is distilled from the teacher model [roneneldan/TinyStories-3
15
  The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
16
 
17
  It achieves the following results on the evaluation set:
18
- - eval_enwikippl: 133.3661
19
- - eval_frwikippl: 19650.0977
20
- - eval_zhwikippl: 54146.3867
21
- - eval_tinystoriesppl: 9.1470
22
- - eval_loss: 1.2078
23
- - eval_runtime: 12.9975
24
- - eval_samples_per_second: 76.938
25
- - eval_steps_per_second: 9.617
26
 
27
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
28
  should probably proofread and complete it, then remove this comment.
@@ -47,7 +47,7 @@ More information needed
47
  The following hyperparameters were used during training:
48
  - distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
49
  - train_embeddings: True
50
- - learning_rate: 4e-06
51
  - train_batch_size: 1
52
  - eval_batch_size: 8
53
  - seed: 42
@@ -56,33 +56,33 @@ The following hyperparameters were used during training:
56
  - num_epochs: 1.0
57
 
58
  ### Resource Usage
59
- Peak GPU Memory: 6.6064 GB
60
 
61
  ### Eval-Phase Metrics
62
  | step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl |
63
  | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
64
  | **teacher eval** | | 169.9865 | 47377.9414 | | | | | 3.9789 | 4998.1294 |
65
- | 0 | 0 | 50480.5703 | 85684.4844 | 6.8305 | 13.0278 | 76.759 | 9.595 | 33932.0586 | 94692.1562 |
66
- | 5000 | 0.0505 | 133.2834 | 19661.1758 | 1.2085 | 13.0193 | 76.809 | 9.601 | 9.1500 | 54349.0273 |
67
- | 10000 | 0.1010 | 133.3712 | 19627.9590 | 1.2079 | 13.0358 | 76.712 | 9.589 | 9.1530 | 54117.5273 |
68
- | 15000 | 0.1515 | 133.2834 | 19650.0977 | 1.2077 | 13.0514 | 76.62 | 9.578 | 9.1402 | 54088.6328 |
69
- | 20000 | 0.2020 | 133.3299 | 19639.0254 | 1.2077 | 13.0057 | 76.889 | 9.611 | 9.1545 | 54204.1992 |
70
- | 25000 | 0.2525 | 133.4539 | 19650.0977 | 1.2079 | 13.0301 | 76.745 | 9.593 | 9.1538 | 54349.0273 |
71
- | 30000 | 0.3030 | 133.5160 | 19650.0977 | 1.2079 | 13.03 | 76.746 | 9.593 | 9.1542 | 54088.6328 |
72
- | 35000 | 0.3535 | 133.2834 | 19627.9590 | 1.2078 | 13.0569 | 76.588 | 9.573 | 9.1451 | 54117.5273 |
73
- | 40000 | 0.4040 | 133.3712 | 19627.9590 | 1.2078 | 12.9991 | 76.928 | 9.616 | 9.1523 | 54146.3867 |
74
- | 45000 | 0.4545 | 133.3041 | 19650.0977 | 1.2077 | 12.9923 | 76.969 | 9.621 | 9.1477 | 54088.6328 |
75
- | 50000 | 0.5051 | 133.2834 | 19650.0977 | 1.2078 | 13.1989 | 75.764 | 9.47 | 9.1470 | 54204.1992 |
76
- | 55000 | 0.5556 | 133.4953 | 19650.0977 | 1.2078 | 13.1556 | 76.013 | 9.502 | 9.1485 | 54117.5273 |
77
- | 60000 | 0.6061 | 133.4901 | 19661.1758 | 1.2077 | 13.206 | 75.723 | 9.465 | 9.1477 | 54117.5273 |
78
- | 65000 | 0.6566 | 133.4488 | 19650.0977 | 1.2077 | 13.0052 | 76.892 | 9.612 | 9.1470 | 54117.5273 |
79
- | 70000 | 0.7071 | 133.3661 | 19650.0977 | 1.2078 | 12.9996 | 76.925 | 9.616 | 9.1470 | 54117.5273 |
80
- | 75000 | 0.7576 | 133.4074 | 19650.0977 | 1.2079 | 13.0082 | 76.874 | 9.609 | 9.1470 | 54117.5273 |
81
- | 80000 | 0.8081 | 133.4488 | 19650.0977 | 1.2078 | 12.9816 | 77.032 | 9.629 | 9.1485 | 54117.5273 |
82
- | 85000 | 0.8586 | 133.3661 | 19650.0977 | 1.2077 | 12.9875 | 76.997 | 9.625 | 9.1470 | 54117.5273 |
83
- | 90000 | 0.9091 | 133.3661 | 19650.0977 | 1.2077 | 12.985 | 77.012 | 9.626 | 9.1462 | 54117.5273 |
84
- | 95000 | 0.9596 | 133.4074 | 19650.0977 | 1.2078 | 13.0478 | 76.641 | 9.58 | 9.1470 | 54146.3867 |
85
- | 99000 | 1.0 | 133.3661 | 19650.0977 | 1.2078 | 12.9975 | 76.938 | 9.617 | 9.1470 | 54146.3867 |
86
 
87
  ### Framework versions
88
  - Distily 0.2.0
 
15
  The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
16
 
17
  It achieves the following results on the evaluation set:
18
+ - eval_enwikippl: 12766.3359
19
+ - eval_frwikippl: 57742.3438
20
+ - eval_zhwikippl: 65334.25
21
+ - eval_tinystoriesppl: 4770.0942
22
+ - eval_loss: 5.2085
23
+ - eval_runtime: 13.0328
24
+ - eval_samples_per_second: 76.73
25
+ - eval_steps_per_second: 9.591
26
 
27
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
28
  should probably proofread and complete it, then remove this comment.
 
47
  The following hyperparameters were used during training:
48
  - distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
49
  - train_embeddings: True
50
+ - learning_rate: 1e-06
51
  - train_batch_size: 1
52
  - eval_batch_size: 8
53
  - seed: 42
 
56
  - num_epochs: 1.0
57
 
58
  ### Resource Usage
59
+ Peak GPU Memory: 6.6048 GB
60
 
61
  ### Eval-Phase Metrics
62
  | step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl |
63
  | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
64
  | **teacher eval** | | 169.9865 | 47377.9414 | | | | | 3.9789 | 4998.1294 |
65
+ | 0 | 0 | 61801.5039 | 81001.6719 | 6.4680 | 13.0128 | 76.847 | 9.606 | 44522.7852 | 75358.2109 |
66
+ | 5000 | 0.0505 | 12766.3359 | 57742.3438 | 5.2085 | 12.9999 | 76.923 | 9.615 | 4771.6733 | 65264.5664 |
67
+ | 10000 | 0.1010 | 12766.3359 | 57742.3438 | 5.2085 | 13.0144 | 76.838 | 9.605 | 4768.5161 | 65334.25 |
68
+ | 15000 | 0.1515 | 12766.3359 | 57742.3438 | 5.2085 | 13.0239 | 76.782 | 9.598 | 4770.0942 | 65334.25 |
69
+ | 20000 | 0.2020 | 12766.3359 | 57742.3438 | 5.2085 | 12.9909 | 76.977 | 9.622 | 4769.3076 | 65334.25 |
70
+ | 25000 | 0.2525 | 12766.3359 | 57709.8086 | 5.2083 | 13.1403 | 76.102 | 9.513 | 4768.5161 | 65334.25 |
71
+ | 30000 | 0.3030 | 12766.3359 | 57709.8086 | 5.2083 | 13.0382 | 76.698 | 9.587 | 4768.5161 | 65334.25 |
72
+ | 35000 | 0.3535 | 12766.3359 | 57742.3438 | 5.2083 | 13.0826 | 76.438 | 9.555 | 4770.0942 | 65334.25 |
73
+ | 40000 | 0.4040 | 12766.3359 | 57742.3438 | 5.2085 | 13.0472 | 76.645 | 9.581 | 4769.3076 | 65334.25 |
74
+ | 45000 | 0.4545 | 12766.3359 | 57742.3438 | 5.2085 | 13.1664 | 75.951 | 9.494 | 4770.0942 | 65334.25 |
75
+ | 50000 | 0.5051 | 12766.3359 | 57742.3438 | 5.2083 | 13.047 | 76.646 | 9.581 | 4768.5161 | 65334.25 |
76
+ | 55000 | 0.5556 | 12766.3359 | 57742.3438 | 5.2083 | 13.2134 | 75.681 | 9.46 | 4768.5161 | 65334.25 |
77
+ | 60000 | 0.6061 | 12766.3359 | 57742.3438 | 5.2087 | 13.0275 | 76.761 | 9.595 | 4769.3076 | 65334.25 |
78
+ | 65000 | 0.6566 | 12766.3359 | 57742.3438 | 5.2083 | 13.1101 | 76.277 | 9.535 | 4768.5161 | 65334.25 |
79
+ | 70000 | 0.7071 | 12766.3359 | 57742.3438 | 5.2085 | 13.0485 | 76.637 | 9.58 | 4771.6733 | 65334.25 |
80
+ | 75000 | 0.7576 | 12766.3359 | 57742.3438 | 5.2085 | 13.0209 | 76.8 | 9.6 | 4768.5161 | 65299.4297 |
81
+ | 80000 | 0.8081 | 12766.3359 | 57742.3438 | 5.2085 | 13.0587 | 76.577 | 9.572 | 4771.6733 | 65334.25 |
82
+ | 85000 | 0.8586 | 12766.3359 | 57742.3438 | 5.2085 | 13.0404 | 76.685 | 9.586 | 4770.0942 | 65299.4297 |
83
+ | 90000 | 0.9091 | 12766.3359 | 57742.3438 | 5.2087 | 13.0082 | 76.874 | 9.609 | 4770.0942 | 65334.25 |
84
+ | 95000 | 0.9596 | 12766.3359 | 57742.3438 | 5.2085 | 13.0077 | 76.878 | 9.61 | 4769.3076 | 65334.25 |
85
+ | 99000 | 1.0 | 12766.3359 | 57742.3438 | 5.2085 | 13.0328 | 76.73 | 9.591 | 4770.0942 | 65334.25 |
86
 
87
  ### Framework versions
88
  - Distily 0.2.0
logs/copy_teacher_modules=_(_lm_head___False)_, learning_rate=1e-06, max_grad_norm=100/events.out.tfevents.1724020242.5f530b1cf724 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6ff96646ae5176807f12d658fc1af0c7a75c55c76879db145bd4b0f04c6b2fa8
3
+ size 312