lapp0 commited on
Commit
bcb54b5
1 Parent(s): 5a66adb

End of training

Browse files
README.md CHANGED
@@ -15,14 +15,14 @@ This student model is distilled from the teacher model [roneneldan/TinyStories-3
15
  The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
16
 
17
  It achieves the following results on the evaluation set:
18
- - eval_enwikippl: 4839.0244
19
- - eval_frwikippl: 39016.5469
20
- - eval_zhwikippl: 56057.3555
21
- - eval_tinystoriesppl: 1738.1987
22
- - eval_loss: 4.8480
23
- - eval_runtime: 13.0187
24
- - eval_samples_per_second: 76.813
25
- - eval_steps_per_second: 9.602
26
 
27
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
28
  should probably proofread and complete it, then remove this comment.
@@ -47,7 +47,7 @@ More information needed
47
  The following hyperparameters were used during training:
48
  - distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
49
  - train_embeddings: True
50
- - learning_rate: 1e-06
51
  - train_batch_size: 1
52
  - eval_batch_size: 8
53
  - seed: 42
@@ -56,33 +56,33 @@ The following hyperparameters were used during training:
56
  - num_epochs: 1.0
57
 
58
  ### Resource Usage
59
- Peak GPU Memory: 6.6048 GB
60
 
61
  ### Eval-Phase Metrics
62
  | step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl |
63
  | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
64
  | **teacher eval** | | 169.9865 | 47377.9414 | | | | | 3.9789 | 4998.1294 |
65
- | 0 | 0 | 23326.0117 | 54995.6562 | 6.1135 | 13.0206 | 76.802 | 9.6 | 17003.8281 | 65438.9492 |
66
- | 5000 | 0.0505 | 4839.0244 | 39016.5469 | 4.8478 | 13.0137 | 76.842 | 9.605 | 1738.1987 | 56102.2266 |
67
- | 10000 | 0.1010 | 4839.0244 | 39016.5469 | 4.8480 | 12.9688 | 77.108 | 9.639 | 1738.1987 | 56087.25 |
68
- | 15000 | 0.1515 | 4839.0244 | 39016.5469 | 4.8488 | 12.9532 | 77.201 | 9.65 | 1739.0610 | 56102.2266 |
69
- | 20000 | 0.2020 | 4839.0244 | 39016.5469 | 4.8488 | 13.014 | 76.84 | 9.605 | 1738.4856 | 56072.2734 |
70
- | 25000 | 0.2525 | 4839.0244 | 39016.5469 | 4.8485 | 13.0041 | 76.899 | 9.612 | 1738.1987 | 56057.3555 |
71
- | 30000 | 0.3030 | 4839.0244 | 39016.5469 | 4.8488 | 12.9823 | 77.028 | 9.629 | 1738.4856 | 56057.3555 |
72
- | 35000 | 0.3535 | 4839.0244 | 39016.5469 | 4.8485 | 13.037 | 76.705 | 9.588 | 1739.0610 | 56072.2734 |
73
- | 40000 | 0.4040 | 4839.0244 | 39016.5469 | 4.8482 | 12.9918 | 76.972 | 9.621 | 1738.1987 | 56057.3555 |
74
- | 45000 | 0.4545 | 4839.0244 | 38994.5625 | 4.8485 | 12.9951 | 76.952 | 9.619 | 1738.7732 | 56057.3555 |
75
- | 50000 | 0.5051 | 4839.0244 | 39016.5469 | 4.8482 | 12.9731 | 77.083 | 9.635 | 1739.3488 | 56072.2734 |
76
- | 55000 | 0.5556 | 4839.0244 | 39016.5469 | 4.8478 | 12.9694 | 77.105 | 9.638 | 1738.4856 | 56057.3555 |
77
- | 60000 | 0.6061 | 4839.0244 | 39016.5469 | 4.8488 | 13.0353 | 76.715 | 9.589 | 1738.7732 | 56057.3555 |
78
- | 65000 | 0.6566 | 4839.0244 | 39016.5469 | 4.8478 | 13.0087 | 76.872 | 9.609 | 1738.1987 | 56057.3555 |
79
- | 70000 | 0.7071 | 4839.0244 | 39016.5469 | 4.8485 | 13.033 | 76.728 | 9.591 | 1738.1987 | 56057.3555 |
80
- | 75000 | 0.7576 | 4839.0244 | 39016.5469 | 4.8480 | 13.0328 | 76.729 | 9.591 | 1738.4856 | 56057.3555 |
81
- | 80000 | 0.8081 | 4839.0244 | 39016.5469 | 4.8482 | 12.9884 | 76.992 | 9.624 | 1737.9111 | 56057.3555 |
82
- | 85000 | 0.8586 | 4839.0244 | 39016.5469 | 4.8485 | 13.0047 | 76.895 | 9.612 | 1738.1987 | 56087.25 |
83
- | 90000 | 0.9091 | 4839.0244 | 39016.5469 | 4.8478 | 13.0255 | 76.772 | 9.597 | 1738.1987 | 56057.3555 |
84
- | 95000 | 0.9596 | 4839.0244 | 39016.5469 | 4.8482 | 13.0007 | 76.919 | 9.615 | 1738.1987 | 56057.3555 |
85
- | 99000 | 1.0 | 4839.0244 | 39016.5469 | 4.8480 | 13.0187 | 76.813 | 9.602 | 1738.1987 | 56057.3555 |
86
 
87
  ### Framework versions
88
  - Distily 0.2.0
 
15
  The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
16
 
17
  It achieves the following results on the evaluation set:
18
+ - eval_enwikippl: 133.3661
19
+ - eval_frwikippl: 19650.0977
20
+ - eval_zhwikippl: 54146.3867
21
+ - eval_tinystoriesppl: 9.1470
22
+ - eval_loss: 1.2078
23
+ - eval_runtime: 12.9975
24
+ - eval_samples_per_second: 76.938
25
+ - eval_steps_per_second: 9.617
26
 
27
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
28
  should probably proofread and complete it, then remove this comment.
 
47
  The following hyperparameters were used during training:
48
  - distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
49
  - train_embeddings: True
50
+ - learning_rate: 4e-06
51
  - train_batch_size: 1
52
  - eval_batch_size: 8
53
  - seed: 42
 
56
  - num_epochs: 1.0
57
 
58
  ### Resource Usage
59
+ Peak GPU Memory: 6.6064 GB
60
 
61
  ### Eval-Phase Metrics
62
  | step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl |
63
  | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
64
  | **teacher eval** | | 169.9865 | 47377.9414 | | | | | 3.9789 | 4998.1294 |
65
+ | 0 | 0 | 50480.5703 | 85684.4844 | 6.8305 | 13.0278 | 76.759 | 9.595 | 33932.0586 | 94692.1562 |
66
+ | 5000 | 0.0505 | 133.2834 | 19661.1758 | 1.2085 | 13.0193 | 76.809 | 9.601 | 9.1500 | 54349.0273 |
67
+ | 10000 | 0.1010 | 133.3712 | 19627.9590 | 1.2079 | 13.0358 | 76.712 | 9.589 | 9.1530 | 54117.5273 |
68
+ | 15000 | 0.1515 | 133.2834 | 19650.0977 | 1.2077 | 13.0514 | 76.62 | 9.578 | 9.1402 | 54088.6328 |
69
+ | 20000 | 0.2020 | 133.3299 | 19639.0254 | 1.2077 | 13.0057 | 76.889 | 9.611 | 9.1545 | 54204.1992 |
70
+ | 25000 | 0.2525 | 133.4539 | 19650.0977 | 1.2079 | 13.0301 | 76.745 | 9.593 | 9.1538 | 54349.0273 |
71
+ | 30000 | 0.3030 | 133.5160 | 19650.0977 | 1.2079 | 13.03 | 76.746 | 9.593 | 9.1542 | 54088.6328 |
72
+ | 35000 | 0.3535 | 133.2834 | 19627.9590 | 1.2078 | 13.0569 | 76.588 | 9.573 | 9.1451 | 54117.5273 |
73
+ | 40000 | 0.4040 | 133.3712 | 19627.9590 | 1.2078 | 12.9991 | 76.928 | 9.616 | 9.1523 | 54146.3867 |
74
+ | 45000 | 0.4545 | 133.3041 | 19650.0977 | 1.2077 | 12.9923 | 76.969 | 9.621 | 9.1477 | 54088.6328 |
75
+ | 50000 | 0.5051 | 133.2834 | 19650.0977 | 1.2078 | 13.1989 | 75.764 | 9.47 | 9.1470 | 54204.1992 |
76
+ | 55000 | 0.5556 | 133.4953 | 19650.0977 | 1.2078 | 13.1556 | 76.013 | 9.502 | 9.1485 | 54117.5273 |
77
+ | 60000 | 0.6061 | 133.4901 | 19661.1758 | 1.2077 | 13.206 | 75.723 | 9.465 | 9.1477 | 54117.5273 |
78
+ | 65000 | 0.6566 | 133.4488 | 19650.0977 | 1.2077 | 13.0052 | 76.892 | 9.612 | 9.1470 | 54117.5273 |
79
+ | 70000 | 0.7071 | 133.3661 | 19650.0977 | 1.2078 | 12.9996 | 76.925 | 9.616 | 9.1470 | 54117.5273 |
80
+ | 75000 | 0.7576 | 133.4074 | 19650.0977 | 1.2079 | 13.0082 | 76.874 | 9.609 | 9.1470 | 54117.5273 |
81
+ | 80000 | 0.8081 | 133.4488 | 19650.0977 | 1.2078 | 12.9816 | 77.032 | 9.629 | 9.1485 | 54117.5273 |
82
+ | 85000 | 0.8586 | 133.3661 | 19650.0977 | 1.2077 | 12.9875 | 76.997 | 9.625 | 9.1470 | 54117.5273 |
83
+ | 90000 | 0.9091 | 133.3661 | 19650.0977 | 1.2077 | 12.985 | 77.012 | 9.626 | 9.1462 | 54117.5273 |
84
+ | 95000 | 0.9596 | 133.4074 | 19650.0977 | 1.2078 | 13.0478 | 76.641 | 9.58 | 9.1470 | 54146.3867 |
85
+ | 99000 | 1.0 | 133.3661 | 19650.0977 | 1.2078 | 12.9975 | 76.938 | 9.617 | 9.1470 | 54146.3867 |
86
 
87
  ### Framework versions
88
  - Distily 0.2.0
logs/copy_teacher_modules=_(_lm_head___False)_, learning_rate=4e-06/events.out.tfevents.1724012794.5f530b1cf724 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a83ba5dba130affcfe3553bfe73b6e9eeadd76f9c351871e8c5c0a98ee6166bb
3
+ size 312