yoshitomo-matsubara commited on
Commit
c0e1315
1 Parent(s): 209e213

tuned hyperparameters

Browse files
Files changed (3) hide show
  1. pytorch_model.bin +1 -1
  2. tokenizer.json +0 -0
  3. training.log +75 -44
pytorch_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:2c56b0e5ff5f633a50e058d03a2f352413dcfaaa5be4340d71ab956d58ab39c1
3
  size 1340746825
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ada5a495d6d195958b85e970e0feebbe34cad75f595ee476ea7fd05c4a047d79
3
  size 1340746825
tokenizer.json CHANGED
The diff for this file is too large to render. See raw diff
 
training.log CHANGED
@@ -1,50 +1,81 @@
1
- 2021-05-22 19:12:22,619 INFO __main__ Namespace(adjust_lr=False, config='torchdistill/configs/sample/glue/qnli/ce/bert_large_uncased.yaml', log='log/glue/qnli/ce/bert_large_uncased.txt', private_output='leaderboard/glue/standard/bert_large_uncased/', seed=None, student_only=False, task_name='qnli', test_only=False, world_size=1)
2
- 2021-05-22 19:12:22,665 INFO __main__ Distributed environment: NO
3
  Num processes: 1
4
  Process index: 0
5
  Local process index: 0
6
  Device: cuda
7
  Use FP16 precision: True
8
 
9
- 2021-05-22 19:13:15,971 INFO __main__ Start training
10
- 2021-05-22 19:13:15,972 INFO torchdistill.models.util [student model]
11
- 2021-05-22 19:13:15,972 INFO torchdistill.models.util Using the original student model
12
- 2021-05-22 19:13:15,972 INFO torchdistill.core.training Loss = 1.0 * OrgLoss
13
- 2021-05-22 19:13:21,553 INFO torchdistill.misc.log Epoch: [0] [ 0/3274] eta: 0:20:16 lr: 1.9997963754836084e-05 sample/s: 11.305340672017941 loss: 0.7982 (0.7982) time: 0.3715 data: 0.0177 max mem: 6528
14
- 2021-05-22 19:15:17,991 INFO torchdistill.misc.log Epoch: [0] [ 500/3274] eta: 0:10:46 lr: 1.8979841172877215e-05 sample/s: 20.366457991155265 loss: 0.3002 (0.4591) time: 0.2386 data: 0.0051 max mem: 12387
15
- 2021-05-22 19:17:15,461 INFO torchdistill.misc.log Epoch: [0] [1000/3274] eta: 0:08:52 lr: 1.796171859091835e-05 sample/s: 14.86020497731184 loss: 0.2730 (0.3861) time: 0.2408 data: 0.0047 max mem: 12387
16
- 2021-05-22 19:19:11,888 INFO torchdistill.misc.log Epoch: [0] [1500/3274] eta: 0:06:54 lr: 1.694359600895948e-05 sample/s: 16.38974601447187 loss: 0.2440 (0.3542) time: 0.2363 data: 0.0047 max mem: 12387
17
- 2021-05-22 19:21:08,394 INFO torchdistill.misc.log Epoch: [0] [2000/3274] eta: 0:04:57 lr: 1.5925473427000613e-05 sample/s: 17.12289296313987 loss: 0.2741 (0.3352) time: 0.2356 data: 0.0047 max mem: 12387
18
- 2021-05-22 19:23:05,117 INFO torchdistill.misc.log Epoch: [0] [2500/3274] eta: 0:03:00 lr: 1.4907350845041744e-05 sample/s: 21.28394165862149 loss: 0.2282 (0.3246) time: 0.2402 data: 0.0048 max mem: 12387
19
- 2021-05-22 19:25:01,923 INFO torchdistill.misc.log Epoch: [0] [3000/3274] eta: 0:01:03 lr: 1.3889228263082878e-05 sample/s: 18.714775413316023 loss: 0.2385 (0.3134) time: 0.2286 data: 0.0049 max mem: 12387
20
- 2021-05-22 19:26:05,878 INFO torchdistill.misc.log Epoch: [0] Total time: 0:12:44
21
- 2021-05-22 19:26:17,529 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/qnli/default_experiment-1-0.arrow
22
- 2021-05-22 19:26:17,529 INFO __main__ Validation: accuracy = 0.9150649826102873
23
- 2021-05-22 19:26:17,529 INFO __main__ Updating ckpt
24
- 2021-05-22 19:26:22,622 INFO torchdistill.misc.log Epoch: [1] [ 0/3274] eta: 0:11:40 lr: 1.3331297088169417e-05 sample/s: 19.677133558129306 loss: 0.1484 (0.1484) time: 0.2138 data: 0.0105 max mem: 12387
25
- 2021-05-22 19:28:19,309 INFO torchdistill.misc.log Epoch: [1] [ 500/3274] eta: 0:10:47 lr: 1.2313174506210548e-05 sample/s: 17.8955634464596 loss: 0.0962 (0.1458) time: 0.2408 data: 0.0048 max mem: 12387
26
- 2021-05-22 19:30:15,867 INFO torchdistill.misc.log Epoch: [1] [1000/3274] eta: 0:08:50 lr: 1.1295051924251682e-05 sample/s: 18.815583413520308 loss: 0.0378 (0.1418) time: 0.2301 data: 0.0046 max mem: 12387
27
- 2021-05-22 19:32:11,984 INFO torchdistill.misc.log Epoch: [1] [1500/3274] eta: 0:06:53 lr: 1.0276929342292811e-05 sample/s: 16.385696175580676 loss: 0.1266 (0.1432) time: 0.2332 data: 0.0046 max mem: 12387
28
- 2021-05-22 19:34:09,106 INFO torchdistill.misc.log Epoch: [1] [2000/3274] eta: 0:04:57 lr: 9.258806760333945e-06 sample/s: 20.490180057719105 loss: 0.1234 (0.1409) time: 0.2426 data: 0.0047 max mem: 12387
29
- 2021-05-22 19:36:05,522 INFO torchdistill.misc.log Epoch: [1] [2500/3274] eta: 0:03:00 lr: 8.240684178375076e-06 sample/s: 18.810963296837595 loss: 0.0786 (0.1388) time: 0.2322 data: 0.0049 max mem: 12387
30
- 2021-05-22 19:38:03,104 INFO torchdistill.misc.log Epoch: [1] [3000/3274] eta: 0:01:03 lr: 7.222561596416208e-06 sample/s: 17.176536111520633 loss: 0.1368 (0.1376) time: 0.2340 data: 0.0049 max mem: 12387
31
- 2021-05-22 19:39:06,921 INFO torchdistill.misc.log Epoch: [1] Total time: 0:12:44
32
- 2021-05-22 19:39:18,573 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/qnli/default_experiment-1-0.arrow
33
- 2021-05-22 19:39:18,573 INFO __main__ Validation: accuracy = 0.9172615778876075
34
- 2021-05-22 19:39:18,573 INFO __main__ Updating ckpt
35
- 2021-05-22 19:39:23,961 INFO torchdistill.misc.log Epoch: [2] [ 0/3274] eta: 0:13:22 lr: 6.6646304215027494e-06 sample/s: 18.064977953408743 loss: 0.0858 (0.0858) time: 0.2453 data: 0.0238 max mem: 12387
36
- 2021-05-22 19:41:20,819 INFO torchdistill.misc.log Epoch: [2] [ 500/3274] eta: 0:10:48 lr: 5.646507839543881e-06 sample/s: 20.492357451881382 loss: 0.0001 (0.0684) time: 0.2278 data: 0.0046 max mem: 12387
37
- 2021-05-22 19:43:17,647 INFO torchdistill.misc.log Epoch: [2] [1000/3274] eta: 0:08:51 lr: 4.628385257585013e-06 sample/s: 17.50339693191113 loss: 0.0000 (0.0857) time: 0.2364 data: 0.0047 max mem: 12387
38
- 2021-05-22 19:45:13,046 INFO torchdistill.misc.log Epoch: [2] [1500/3274] eta: 0:06:52 lr: 3.6102626756261456e-06 sample/s: 15.964860060672729 loss: 0.0000 (0.0954) time: 0.2298 data: 0.0047 max mem: 12387
39
- 2021-05-22 19:47:08,839 INFO torchdistill.misc.log Epoch: [2] [2000/3274] eta: 0:04:56 lr: 2.5921400936672775e-06 sample/s: 19.177665171901772 loss: 0.0000 (0.0998) time: 0.2280 data: 0.0047 max mem: 12387
40
- 2021-05-22 19:49:03,179 INFO torchdistill.misc.log Epoch: [2] [2500/3274] eta: 0:02:59 lr: 1.5740175117084096e-06 sample/s: 21.98736368652192 loss: 0.0000 (0.1038) time: 0.2352 data: 0.0047 max mem: 12387
41
- 2021-05-22 19:50:59,281 INFO torchdistill.misc.log Epoch: [2] [3000/3274] eta: 0:01:03 lr: 5.558949297495419e-07 sample/s: 16.504755496267613 loss: 0.0000 (0.1080) time: 0.2303 data: 0.0047 max mem: 12387
42
- 2021-05-22 19:52:01,461 INFO torchdistill.misc.log Epoch: [2] Total time: 0:12:37
43
- 2021-05-22 19:52:13,103 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/qnli/default_experiment-1-0.arrow
44
- 2021-05-22 19:52:13,104 INFO __main__ Validation: accuracy = 0.9214717188358045
45
- 2021-05-22 19:52:13,104 INFO __main__ Updating ckpt
46
- 2021-05-22 19:52:26,136 INFO __main__ [Student: bert-large-uncased]
47
- 2021-05-22 19:52:37,770 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/qnli/default_experiment-1-0.arrow
48
- 2021-05-22 19:52:37,771 INFO __main__ Test: accuracy = 0.9214717188358045
49
- 2021-05-22 19:52:37,771 INFO __main__ Start prediction for private dataset(s)
50
- 2021-05-22 19:52:37,772 INFO __main__ qnli/test: 5463 samples
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2021-05-26 16:42:43,973 INFO __main__ Namespace(adjust_lr=False, config='torchdistill/configs/sample/glue/qnli/ce/bert_large_uncased.yaml', log='log/glue/qnli/ce/bert_large_uncased.txt', private_output='leaderboard/glue/standard/bert_large_uncased/', seed=None, student_only=False, task_name='qnli', test_only=False, world_size=1)
2
+ 2021-05-26 16:42:44,037 INFO __main__ Distributed environment: NO
3
  Num processes: 1
4
  Process index: 0
5
  Local process index: 0
6
  Device: cuda
7
  Use FP16 precision: True
8
 
9
+ 2021-05-26 16:42:44,389 INFO filelock Lock 139623502170640 acquired on /root/.cache/huggingface/transformers/1cf090f220f9674b67b3434decfe4d40a6532d7849653eac435ff94d31a4904c.1d03e5e4fa2db2532c517b2cd98290d8444b237619bd3d2039850a6d5e86473d.lock
10
+ 2021-05-26 16:42:44,742 INFO filelock Lock 139623502170640 released on /root/.cache/huggingface/transformers/1cf090f220f9674b67b3434decfe4d40a6532d7849653eac435ff94d31a4904c.1d03e5e4fa2db2532c517b2cd98290d8444b237619bd3d2039850a6d5e86473d.lock
11
+ 2021-05-26 16:42:45,448 INFO filelock Lock 139623502137488 acquired on /root/.cache/huggingface/transformers/e12f02d630da91a0982ce6db1ad595231d155a2b725ab106971898276d842ecc.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99.lock
12
+ 2021-05-26 16:42:45,957 INFO filelock Lock 139623502137488 released on /root/.cache/huggingface/transformers/e12f02d630da91a0982ce6db1ad595231d155a2b725ab106971898276d842ecc.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99.lock
13
+ 2021-05-26 16:42:46,307 INFO filelock Lock 139623464315024 acquired on /root/.cache/huggingface/transformers/475d46024228961ca8770cead39e1079f135fd2441d14cf216727ffac8d41d78.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4.lock
14
+ 2021-05-26 16:42:46,874 INFO filelock Lock 139623464315024 released on /root/.cache/huggingface/transformers/475d46024228961ca8770cead39e1079f135fd2441d14cf216727ffac8d41d78.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4.lock
15
+ 2021-05-26 16:42:47,920 INFO filelock Lock 139623502137488 acquired on /root/.cache/huggingface/transformers/300ecd79785b4602752c0085f8a89c3f0232ef367eda291c79a5600f3778b677.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79.lock
16
+ 2021-05-26 16:42:48,273 INFO filelock Lock 139623502137488 released on /root/.cache/huggingface/transformers/300ecd79785b4602752c0085f8a89c3f0232ef367eda291c79a5600f3778b677.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79.lock
17
+ 2021-05-26 16:42:48,641 INFO filelock Lock 139623464420688 acquired on /root/.cache/huggingface/transformers/1d959166dd7e047e57ea1b2d9b7b9669938a7e90c5e37a03961ad9f15eaea17f.fea64cd906e3766b04c92397f9ad3ff45271749cbe49829a079dd84e34c1697d.lock
18
+ 2021-05-26 16:43:11,143 INFO filelock Lock 139623464420688 released on /root/.cache/huggingface/transformers/1d959166dd7e047e57ea1b2d9b7b9669938a7e90c5e37a03961ad9f15eaea17f.fea64cd906e3766b04c92397f9ad3ff45271749cbe49829a079dd84e34c1697d.lock
19
+ 2021-05-26 16:43:38,005 INFO __main__ Start training
20
+ 2021-05-26 16:43:38,006 INFO torchdistill.models.util [student model]
21
+ 2021-05-26 16:43:38,006 INFO torchdistill.models.util Using the original student model
22
+ 2021-05-26 16:43:38,006 INFO torchdistill.core.training Loss = 1.0 * OrgLoss
23
+ 2021-05-26 16:43:44,804 INFO torchdistill.misc.log Epoch: [0] [ 0/6547] eta: 1:18:28 lr: 1.9998981721908255e-05 sample/s: 5.754612514483244 loss: 0.7042 (0.7042) time: 0.7192 data: 0.0241 max mem: 5376
24
+ 2021-05-26 16:48:21,522 INFO torchdistill.misc.log Epoch: [0] [ 500/6547] eta: 0:55:48 lr: 1.9489842676034826e-05 sample/s: 8.624026680319357 loss: 0.3131 (0.4845) time: 0.5568 data: 0.0025 max mem: 9056
25
+ 2021-05-26 16:52:58,289 INFO torchdistill.misc.log Epoch: [0] [1000/6547] eta: 0:51:11 lr: 1.89807036301614e-05 sample/s: 5.549441556800583 loss: 0.3426 (0.4185) time: 0.5480 data: 0.0025 max mem: 9056
26
+ 2021-05-26 16:57:38,265 INFO torchdistill.misc.log Epoch: [0] [1500/6547] eta: 0:46:44 lr: 1.847156458428797e-05 sample/s: 8.61720964609647 loss: 0.2848 (0.3878) time: 0.5259 data: 0.0025 max mem: 9056
27
+ 2021-05-26 17:02:15,556 INFO torchdistill.misc.log Epoch: [0] [2000/6547] eta: 0:42:05 lr: 1.7962425538414542e-05 sample/s: 8.623813900317252 loss: 0.2981 (0.3675) time: 0.5580 data: 0.0026 max mem: 9056
28
+ 2021-05-26 17:06:49,604 INFO torchdistill.misc.log Epoch: [0] [2500/6547] eta: 0:37:21 lr: 1.7453286492541113e-05 sample/s: 7.9935431184565235 loss: 0.2886 (0.3520) time: 0.5320 data: 0.0025 max mem: 9056
29
+ 2021-05-26 17:11:23,224 INFO torchdistill.misc.log Epoch: [0] [3000/6547] eta: 0:32:40 lr: 1.6944147446667688e-05 sample/s: 7.997067573087834 loss: 0.2193 (0.3400) time: 0.5656 data: 0.0025 max mem: 9056
30
+ 2021-05-26 17:15:59,135 INFO torchdistill.misc.log Epoch: [0] [3500/6547] eta: 0:28:04 lr: 1.643500840079426e-05 sample/s: 7.416189779873241 loss: 0.2485 (0.3283) time: 0.5845 data: 0.0026 max mem: 9056
31
+ 2021-05-26 17:20:33,027 INFO torchdistill.misc.log Epoch: [0] [4000/6547] eta: 0:23:26 lr: 1.592586935492083e-05 sample/s: 6.24177459511022 loss: 0.2533 (0.3213) time: 0.5377 data: 0.0025 max mem: 9056
32
+ 2021-05-26 17:25:08,083 INFO torchdistill.misc.log Epoch: [0] [4500/6547] eta: 0:18:49 lr: 1.5416730309047404e-05 sample/s: 8.622683682589413 loss: 0.3024 (0.3135) time: 0.5730 data: 0.0025 max mem: 9056
33
+ 2021-05-26 17:29:46,131 INFO torchdistill.misc.log Epoch: [0] [5000/6547] eta: 0:14:14 lr: 1.4907591263173975e-05 sample/s: 6.9801876226312105 loss: 0.1747 (0.3079) time: 0.5553 data: 0.0025 max mem: 9056
34
+ 2021-05-26 17:34:22,397 INFO torchdistill.misc.log Epoch: [0] [5500/6547] eta: 0:09:38 lr: 1.4398452217300548e-05 sample/s: 6.5910168939530545 loss: 0.1467 (0.3024) time: 0.5469 data: 0.0026 max mem: 9056
35
+ 2021-05-26 17:38:57,431 INFO torchdistill.misc.log Epoch: [0] [6000/6547] eta: 0:05:02 lr: 1.3889313171427119e-05 sample/s: 6.97960684730679 loss: 0.2053 (0.2974) time: 0.5522 data: 0.0026 max mem: 9056
36
+ 2021-05-26 17:43:36,576 INFO torchdistill.misc.log Epoch: [0] [6500/6547] eta: 0:00:25 lr: 1.3380174125553688e-05 sample/s: 6.5825322864402445 loss: 0.2341 (0.2939) time: 0.5723 data: 0.0026 max mem: 9056
37
+ 2021-05-26 17:44:01,639 INFO torchdistill.misc.log Epoch: [0] Total time: 1:00:17
38
+ 2021-05-26 17:45:01,865 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/qnli/default_experiment-1-0.arrow
39
+ 2021-05-26 17:45:01,866 INFO __main__ Validation: accuracy = 0.9198242723778144
40
+ 2021-05-26 17:45:01,866 INFO __main__ Updating ckpt at ./resource/ckpt/glue/qnli/ce/qnli-bert-large-uncased
41
+ 2021-05-26 17:45:06,995 INFO torchdistill.misc.log Epoch: [1] [ 0/6547] eta: 0:54:19 lr: 1.3332315055241587e-05 sample/s: 8.160307088582721 loss: 0.0503 (0.0503) time: 0.4978 data: 0.0076 max mem: 9056
42
+ 2021-05-26 17:49:46,069 INFO torchdistill.misc.log Epoch: [1] [ 500/6547] eta: 0:56:14 lr: 1.282317600936816e-05 sample/s: 6.217419742879262 loss: 0.0503 (0.2035) time: 0.5897 data: 0.0025 max mem: 9056
43
+ 2021-05-26 17:54:23,673 INFO torchdistill.misc.log Epoch: [1] [1000/6547] eta: 0:51:27 lr: 1.231403696349473e-05 sample/s: 6.589991684548619 loss: 0.3169 (0.2504) time: 0.5670 data: 0.0026 max mem: 9056
44
+ 2021-05-26 17:58:59,164 INFO torchdistill.misc.log Epoch: [1] [1500/6547] eta: 0:46:39 lr: 1.1804897917621303e-05 sample/s: 7.990056015872275 loss: 0.1724 (0.2513) time: 0.5467 data: 0.0025 max mem: 9056
45
+ 2021-05-26 18:03:36,412 INFO torchdistill.misc.log Epoch: [1] [2000/6547] eta: 0:42:02 lr: 1.1295758871747876e-05 sample/s: 5.847794095606037 loss: 0.2595 (0.2506) time: 0.5593 data: 0.0025 max mem: 9056
46
+ 2021-05-26 18:08:09,765 INFO torchdistill.misc.log Epoch: [1] [2500/6547] eta: 0:37:18 lr: 1.0786619825874447e-05 sample/s: 7.990375667126102 loss: 0.0139 (0.2572) time: 0.5289 data: 0.0025 max mem: 9056
47
+ 2021-05-26 18:12:47,367 INFO torchdistill.misc.log Epoch: [1] [3000/6547] eta: 0:32:43 lr: 1.027748078000102e-05 sample/s: 6.973642075580554 loss: 0.1740 (0.2673) time: 0.5332 data: 0.0026 max mem: 9056
48
+ 2021-05-26 18:17:24,171 INFO torchdistill.misc.log Epoch: [1] [3500/6547] eta: 0:28:06 lr: 9.76834173412759e-06 sample/s: 8.607342080623853 loss: 0.4704 (0.2760) time: 0.5707 data: 0.0025 max mem: 9056
49
+ 2021-05-26 18:22:01,335 INFO torchdistill.misc.log Epoch: [1] [4000/6547] eta: 0:23:29 lr: 9.259202688254163e-06 sample/s: 8.61629355725034 loss: 0.1867 (0.2859) time: 0.5565 data: 0.0025 max mem: 9056
50
+ 2021-05-26 18:26:39,703 INFO torchdistill.misc.log Epoch: [1] [4500/6547] eta: 0:18:53 lr: 8.750063642380736e-06 sample/s: 8.62179301385266 loss: 0.1528 (0.2933) time: 0.5657 data: 0.0025 max mem: 9056
51
+ 2021-05-26 18:31:13,854 INFO torchdistill.misc.log Epoch: [1] [5000/6547] eta: 0:14:16 lr: 8.240924596507307e-06 sample/s: 6.5886769838330705 loss: 0.1091 (0.2886) time: 0.5920 data: 0.0026 max mem: 9056
52
+ 2021-05-26 18:35:47,685 INFO torchdistill.misc.log Epoch: [1] [5500/6547] eta: 0:09:38 lr: 7.73178555063388e-06 sample/s: 6.590584506649424 loss: 0.0833 (0.2890) time: 0.5445 data: 0.0025 max mem: 9056
53
+ 2021-05-26 18:40:23,656 INFO torchdistill.misc.log Epoch: [1] [6000/6547] eta: 0:05:02 lr: 7.222646504760451e-06 sample/s: 8.620140668279657 loss: 0.5154 (0.2930) time: 0.5567 data: 0.0026 max mem: 9056
54
+ 2021-05-26 18:44:58,148 INFO torchdistill.misc.log Epoch: [1] [6500/6547] eta: 0:00:25 lr: 6.713507458887023e-06 sample/s: 5.843950385754279 loss: 0.0087 (0.3004) time: 0.5434 data: 0.0025 max mem: 9056
55
+ 2021-05-26 18:45:23,286 INFO torchdistill.misc.log Epoch: [1] Total time: 1:00:16
56
+ 2021-05-26 18:46:23,512 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/qnli/default_experiment-1-0.arrow
57
+ 2021-05-26 18:46:23,512 INFO __main__ Validation: accuracy = 0.9207395204100312
58
+ 2021-05-26 18:46:23,513 INFO __main__ Updating ckpt at ./resource/ckpt/glue/qnli/ce/qnli-bert-large-uncased
59
+ 2021-05-26 18:46:28,930 INFO torchdistill.misc.log Epoch: [2] [ 0/6547] eta: 0:53:33 lr: 6.66564838857492e-06 sample/s: 8.299093972413548 loss: 0.0001 (0.0001) time: 0.4908 data: 0.0088 max mem: 9056
60
+ 2021-05-26 18:51:02,998 INFO torchdistill.misc.log Epoch: [2] [ 500/6547] eta: 0:55:13 lr: 6.156509342701492e-06 sample/s: 6.973630480902882 loss: 0.0000 (0.2118) time: 0.5519 data: 0.0025 max mem: 9056
61
+ 2021-05-26 18:55:38,010 INFO torchdistill.misc.log Epoch: [2] [1000/6547] eta: 0:50:45 lr: 5.647370296828064e-06 sample/s: 7.982749071696792 loss: 0.0000 (0.2042) time: 0.5058 data: 0.0025 max mem: 9056
62
+ 2021-05-26 19:00:14,646 INFO torchdistill.misc.log Epoch: [2] [1500/6547] eta: 0:46:18 lr: 5.1382312509546365e-06 sample/s: 7.044053211160169 loss: 0.0000 (0.1978) time: 0.5426 data: 0.0025 max mem: 9056
63
+ 2021-05-26 19:04:48,464 INFO torchdistill.misc.log Epoch: [2] [2000/6547] eta: 0:41:39 lr: 4.629092205081208e-06 sample/s: 7.044916908254754 loss: 0.0000 (0.1990) time: 0.5594 data: 0.0026 max mem: 9056
64
+ 2021-05-26 19:09:23,080 INFO torchdistill.misc.log Epoch: [2] [2500/6547] eta: 0:37:04 lr: 4.11995315920778e-06 sample/s: 7.037732786049721 loss: 0.0000 (0.1949) time: 0.5999 data: 0.0026 max mem: 9056
65
+ 2021-05-26 19:13:56,849 INFO torchdistill.misc.log Epoch: [2] [3000/6547] eta: 0:32:28 lr: 3.6108141133343523e-06 sample/s: 6.294816645236476 loss: 0.0000 (0.1968) time: 0.5467 data: 0.0025 max mem: 9056
66
+ 2021-05-26 19:18:30,816 INFO torchdistill.misc.log Epoch: [2] [3500/6547] eta: 0:27:53 lr: 3.1016750674609237e-06 sample/s: 7.4227914858222395 loss: 0.0000 (0.1972) time: 0.5436 data: 0.0025 max mem: 9056
67
+ 2021-05-26 19:23:04,965 INFO torchdistill.misc.log Epoch: [2] [4000/6547] eta: 0:23:18 lr: 2.5925360215874956e-06 sample/s: 5.55695934201043 loss: 0.0000 (0.1956) time: 0.5532 data: 0.0025 max mem: 9056
68
+ 2021-05-26 19:27:38,760 INFO torchdistill.misc.log Epoch: [2] [4500/6547] eta: 0:18:43 lr: 2.083396975714068e-06 sample/s: 6.64891320975452 loss: 0.0000 (0.1942) time: 0.5342 data: 0.0026 max mem: 9056
69
+ 2021-05-26 19:32:17,803 INFO torchdistill.misc.log Epoch: [2] [5000/6547] eta: 0:14:10 lr: 1.5742579298406396e-06 sample/s: 7.037428722675103 loss: 0.0000 (0.1911) time: 0.5242 data: 0.0026 max mem: 9056
70
+ 2021-05-26 19:36:50,820 INFO torchdistill.misc.log Epoch: [2] [5500/6547] eta: 0:09:35 lr: 1.0651188839672114e-06 sample/s: 7.036864946528987 loss: 0.0000 (0.1905) time: 0.5605 data: 0.0026 max mem: 9056
71
+ 2021-05-26 19:41:22,137 INFO torchdistill.misc.log Epoch: [2] [6000/6547] eta: 0:05:00 lr: 5.559798380937835e-07 sample/s: 8.73211181456121 loss: 0.0000 (0.1899) time: 0.5563 data: 0.0025 max mem: 9056
72
+ 2021-05-26 19:45:58,549 INFO torchdistill.misc.log Epoch: [2] [6500/6547] eta: 0:00:25 lr: 4.6840792220355385e-08 sample/s: 7.427628811940013 loss: 0.0000 (0.1888) time: 0.5204 data: 0.0025 max mem: 9056
73
+ 2021-05-26 19:46:24,589 INFO torchdistill.misc.log Epoch: [2] Total time: 0:59:56
74
+ 2021-05-26 19:47:24,703 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/qnli/default_experiment-1-0.arrow
75
+ 2021-05-26 19:47:24,703 INFO __main__ Validation: accuracy = 0.9222039172615779
76
+ 2021-05-26 19:47:24,703 INFO __main__ Updating ckpt at ./resource/ckpt/glue/qnli/ce/qnli-bert-large-uncased
77
+ 2021-05-26 19:47:35,803 INFO __main__ [Student: bert-large-uncased]
78
+ 2021-05-26 19:48:35,910 INFO /usr/local/lib/python3.7/dist-packages/datasets/metric.py Removing /root/.cache/huggingface/metrics/glue/qnli/default_experiment-1-0.arrow
79
+ 2021-05-26 19:48:35,910 INFO __main__ Test: accuracy = 0.9222039172615779
80
+ 2021-05-26 19:48:35,910 INFO __main__ Start prediction for private dataset(s)
81
+ 2021-05-26 19:48:35,912 INFO __main__ qnli/test: 5463 samples