lapp0 commited on
Commit
aaf1356
1 Parent(s): f05a0c7

Training in progress, step 20

Browse files
README.md CHANGED
@@ -1,7 +1,6 @@
1
  ---
2
- base_model: gpt2
3
- library_name: Distily
4
  license: mit
 
5
  tags:
6
  - generated_from_trainer
7
  model-index:
@@ -9,24 +8,14 @@ model-index:
9
  results: []
10
  ---
11
 
12
- # distily_modelcard_try
13
-
14
- This student model is distilled from the teacher model [gpt2](https://huggingface.co/gpt2) using the dataset (unspecified).
15
 
16
- The [Distily](https://github.com/lapp0/distily) library was used for this distillation.
17
 
 
18
  It achieves the following results on the evaluation set:
19
- - eval_enwikippl: 38656.0
20
- - eval_frwikippl: 218112.0
21
- - eval_zhwikippl: 54001664.0
22
- - eval_tinystoriesppl: 12160.0
23
- - eval_loss: 6.4375
24
- - eval_runtime: 0.0668
25
- - eval_samples_per_second: 29.948
26
- - eval_steps_per_second: 14.974
27
-
28
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
29
- should probably proofread and complete it, then remove this comment.
30
 
31
  ## Model description
32
 
@@ -39,15 +28,12 @@ More information needed
39
  ## Training and evaluation data
40
 
41
  More information needed
42
- -->
43
 
44
  ## Training procedure
45
 
46
  ### Training hyperparameters
47
 
48
  The following hyperparameters were used during training:
49
- - distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=mse, layer_mapper=last, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=mse, layer_mapper=layer-2, projector=None))
50
- - train_embeddings: True
51
  - learning_rate: 0.0001
52
  - train_batch_size: 16
53
  - eval_batch_size: 8
@@ -57,18 +43,16 @@ The following hyperparameters were used during training:
57
  - lr_scheduler_warmup_ratio: 0.2
58
  - num_epochs: 1.0
59
 
60
- ### Resource Usage
61
- Peak GPU Memory: 15.4263 GB
 
 
 
62
 
63
- ### Eval-Phase Metrics
64
- | step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | tinystoriesppl | zhwikippl |
65
- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
66
- | **teacher eval** | | 43.75 | 61.75 | | | | | 11.8125 | 19.125 |
67
- | 0 | 0 | 738734374912.0 | 47828755808256.0 | 20.375 | 0.128 | 15.619 | 7.81 | 2617245696.0 | 12232066859008.0 |
68
- | 10 | 1.0 | 38656.0 | 218112.0 | 6.4375 | 0.0668 | 29.948 | 14.974 | 12160.0 | 54001664.0 |
69
 
70
  ### Framework versions
71
- - Distily 0.2.0
72
  - Transformers 4.44.0
73
  - Pytorch 2.3.0
74
  - Datasets 2.21.0
 
 
1
  ---
 
 
2
  license: mit
3
+ base_model: gpt2
4
  tags:
5
  - generated_from_trainer
6
  model-index:
 
8
  results: []
9
  ---
10
 
11
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
12
+ should probably proofread and complete it, then remove this comment. -->
 
13
 
14
+ # distily_modelcard_try
15
 
16
+ This model is a fine-tuned version of [gpt2](https://huggingface.co/gpt2) on an unknown dataset.
17
  It achieves the following results on the evaluation set:
18
+ - Loss: 8.125
 
 
 
 
 
 
 
 
 
 
19
 
20
  ## Model description
21
 
 
28
  ## Training and evaluation data
29
 
30
  More information needed
 
31
 
32
  ## Training procedure
33
 
34
  ### Training hyperparameters
35
 
36
  The following hyperparameters were used during training:
 
 
37
  - learning_rate: 0.0001
38
  - train_batch_size: 16
39
  - eval_batch_size: 8
 
43
  - lr_scheduler_warmup_ratio: 0.2
44
  - num_epochs: 1.0
45
 
46
+ ### Training results
47
+
48
+ | Training Loss | Epoch | Step | Validation Loss |
49
+ |:-------------:|:-----:|:----:|:---------------:|
50
+ | No log | 0 | 0 | 23.625 |
51
 
 
 
 
 
 
 
52
 
53
  ### Framework versions
54
+
55
  - Transformers 4.44.0
56
  - Pytorch 2.3.0
57
  - Datasets 2.21.0
58
+ - Tokenizers 0.19.1
logs/per_device_train_batch_size=16/events.out.tfevents.1724249431.f383272e719b ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:866c45fdc821670bda6395617655199f9e372e784633b6b24cd6cf15c590da6a
3
+ size 568
logs/per_device_train_batch_size=8/events.out.tfevents.1724249657.f383272e719b ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:23a1ac30f91450dcdb0639684e28be61c0019fe4bd7e82be7d13cbbd745f7649
3
+ size 11695
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:ef54b6c21f3e2969ec448c785ea838f4b6fb7363f1c544f03d08a0b7efdac051
3
  size 248894656
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9bb9de2be01c242b09837c7f50e515fb69af4c10bb59ee0ed99fed4303e7ae7c
3
  size 248894656
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:c81b96d2d153fea78d2ec64ea37216c382e12d6073cd6dd197061f03dd43be2e
3
  size 1017899080
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a74ed808eb463babbe034da6d49b7bce830f5a04b451ce3027a4bf3a88084dce
3
  size 1017899080