File size: 16,234 Bytes
ff29165 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 |
nohup: ignoring input
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
No config specified, defaulting to: apps/all
Found cached dataset apps (/home/user/.cache/huggingface/datasets/codeparrot___apps/all/0.0.0/04ac807715d07d6e5cc580f59cdc8213cd7dc4529d0bb819cca72c9f8e8c1aa5)
Max length: 2048
PyTorch: setting up devices
The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).
GPU memory occupied: 2667 MB.
0%| | 0/155 [00:00<?, ?it/s]
1%| | 1/155 [00:03<09:53, 3.85s/it]
{'loss': 28.4696, 'learning_rate': 0.0, 'epoch': 0.03}
1%| | 1/155 [00:03<09:53, 3.85s/it]
1%|β | 2/155 [00:07<08:58, 3.52s/it]
2%|β | 3/155 [00:10<08:38, 3.41s/it]
3%|β | 4/155 [00:13<08:29, 3.37s/it]
3%|β | 5/155 [00:17<08:22, 3.35s/it]
{'loss': 26.9592, 'learning_rate': 1.0000000000000002e-06, 'epoch': 0.16}
3%|β | 5/155 [00:17<08:22, 3.35s/it]
4%|β | 6/155 [00:20<08:16, 3.33s/it]
5%|β | 7/155 [00:23<08:12, 3.33s/it]
5%|β | 8/155 [00:26<08:08, 3.32s/it]
6%|β | 9/155 [00:30<08:04, 3.32s/it]
6%|β | 10/155 [00:33<08:01, 3.32s/it]
{'loss': 24.5245, 'learning_rate': 3.5000000000000004e-06, 'epoch': 0.31}
6%|β | 10/155 [00:33<08:01, 3.32s/it]
7%|β | 11/155 [00:36<07:57, 3.32s/it]
8%|β | 12/155 [00:40<07:54, 3.32s/it]
8%|β | 13/155 [00:43<07:51, 3.32s/it]
9%|β | 14/155 [00:46<07:48, 3.32s/it]
10%|β | 15/155 [00:50<07:45, 3.32s/it]
{'loss': 24.7316, 'learning_rate': 6e-06, 'epoch': 0.47}
10%|β | 15/155 [00:50<07:45, 3.32s/it]
10%|β | 16/155 [00:53<07:41, 3.32s/it]
11%|β | 17/155 [00:56<07:38, 3.32s/it]
12%|ββ | 18/155 [01:00<07:35, 3.32s/it]
12%|ββ | 19/155 [01:03<07:32, 3.32s/it]
13%|ββ | 20/155 [01:06<07:28, 3.32s/it]
{'loss': 16.8561, 'learning_rate': 8.500000000000002e-06, 'epoch': 0.63}
13%|ββ | 20/155 [01:06<07:28, 3.32s/it]
14%|ββ | 21/155 [01:10<07:25, 3.32s/it]
14%|ββ | 22/155 [01:13<07:22, 3.33s/it]
15%|ββ | 23/155 [01:16<07:19, 3.33s/it]
15%|ββ | 24/155 [01:20<07:15, 3.33s/it]
16%|ββ | 25/155 [01:23<07:12, 3.33s/it]
{'loss': 5.1585, 'learning_rate': 1.1000000000000001e-05, 'epoch': 0.79}
16%|ββ | 25/155 [01:23<07:12, 3.33s/it]
17%|ββ | 26/155 [01:26<07:09, 3.33s/it]
17%|ββ | 27/155 [01:30<07:06, 3.33s/it]
18%|ββ | 28/155 [01:33<07:02, 3.33s/it]
19%|ββ | 29/155 [01:36<06:59, 3.33s/it]
19%|ββ | 30/155 [01:40<06:56, 3.33s/it]
{'loss': 1.6063, 'learning_rate': 1.3500000000000001e-05, 'epoch': 0.94}
19%|ββ | 30/155 [01:40<06:56, 3.33s/it]
20%|ββ | 31/155 [01:43<06:53, 3.33s/it]
21%|ββ | 32/155 [01:49<08:21, 4.08s/it]
21%|βββ | 33/155 [01:52<07:50, 3.85s/it]
22%|βββ | 34/155 [01:55<07:27, 3.70s/it]
23%|βββ | 35/155 [01:59<07:10, 3.59s/it]
{'loss': 1.1341, 'learning_rate': 1.6000000000000003e-05, 'epoch': 1.13}
23%|βββ | 35/155 [01:59<07:10, 3.59s/it]
23%|βββ | 36/155 [02:02<06:57, 3.51s/it]
24%|βββ | 37/155 [02:05<06:48, 3.46s/it]
25%|βββ | 38/155 [02:09<06:40, 3.42s/it]
25%|βββ | 39/155 [02:12<06:33, 3.39s/it]
26%|βββ | 40/155 [02:15<06:27, 3.37s/it]
{'loss': 0.8484, 'learning_rate': 1.85e-05, 'epoch': 1.28}
26%|βββ | 40/155 [02:15<06:27, 3.37s/it]
26%|βββ | 41/155 [02:19<06:23, 3.36s/it]
27%|βββ | 42/155 [02:22<06:18, 3.35s/it]
28%|βββ | 43/155 [02:25<06:14, 3.34s/it]
28%|βββ | 44/155 [02:29<06:10, 3.34s/it]
29%|βββ | 45/155 [02:32<06:06, 3.34s/it]
{'loss': 0.777, 'learning_rate': 2.1e-05, 'epoch': 1.44}
29%|βββ | 45/155 [02:32<06:06, 3.34s/it]
30%|βββ | 46/155 [02:35<06:03, 3.33s/it]
30%|βββ | 47/155 [02:39<05:59, 3.33s/it]
31%|βββ | 48/155 [02:42<05:56, 3.33s/it]
32%|ββββ | 49/155 [02:45<05:53, 3.33s/it]
32%|ββββ | 50/155 [02:49<05:49, 3.33s/it]
{'loss': 0.712, 'learning_rate': 2.35e-05, 'epoch': 1.6}
32%|ββββ | 50/155 [02:49<05:49, 3.33s/it]
33%|ββββ | 51/155 [02:52<05:46, 3.33s/it]
34%|ββββ | 52/155 [02:55<05:43, 3.33s/it]
34%|ββββ | 53/155 [02:59<05:39, 3.33s/it]
35%|ββββ | 54/155 [03:02<05:36, 3.33s/it]
35%|ββββ | 55/155 [03:05<05:33, 3.33s/it]
{'loss': 0.6714, 'learning_rate': 2.6000000000000002e-05, 'epoch': 1.76}
35%|ββββ | 55/155 [03:05<05:33, 3.33s/it]
36%|ββββ | 56/155 [03:09<05:29, 3.33s/it]
37%|ββββ | 57/155 [03:12<05:26, 3.33s/it]
37%|ββββ | 58/155 [03:15<05:22, 3.33s/it]
38%|ββββ | 59/155 [03:19<05:19, 3.33s/it]
39%|ββββ | 60/155 [03:22<05:16, 3.33s/it]
{'loss': 0.6215, 'learning_rate': 2.8499999999999998e-05, 'epoch': 1.91}
39%|ββββ | 60/155 [03:22<05:16, 3.33s/it]
39%|ββββ | 61/155 [03:25<05:12, 3.33s/it]
40%|ββββ | 62/155 [03:29<05:09, 3.33s/it]
41%|ββββ | 63/155 [03:34<06:14, 4.07s/it]
41%|βββββ | 64/155 [03:38<05:50, 3.85s/it]
42%|βββββ | 65/155 [03:41<05:32, 3.69s/it]
{'loss': 0.6794, 'learning_rate': 3.1e-05, 'epoch': 2.09}
42%|βββββ | 65/155 [03:41<05:32, 3.69s/it]
43%|βββββ | 66/155 [03:44<05:19, 3.59s/it]
43%|βββββ | 67/155 [03:48<05:08, 3.51s/it]
44%|βββββ | 68/155 [03:51<05:00, 3.45s/it]
45%|βββββ | 69/155 [03:54<04:53, 3.42s/it]
45%|βββββ | 70/155 [03:58<04:48, 3.39s/it]
{'loss': 0.5588, 'learning_rate': 3.35e-05, 'epoch': 2.25}
45%|βββββ | 70/155 [03:58<04:48, 3.39s/it]
46%|βββββ | 71/155 [04:01<04:43, 3.37s/it]
46%|βββββ | 72/155 [04:04<04:38, 3.36s/it]
47%|βββββ | 73/155 [04:08<04:34, 3.35s/it]
48%|βββββ | 74/155 [04:11<04:30, 3.34s/it]
48%|βββββ | 75/155 [04:14<04:27, 3.34s/it]
{'loss': 0.5589, 'learning_rate': 3.6e-05, 'epoch': 2.41}
48%|βββββ | 75/155 [04:14<04:27, 3.34s/it]
49%|βββββ | 76/155 [04:18<04:23, 3.34s/it]
50%|βββββ | 77/155 [04:21<04:20, 3.33s/it]
50%|βββββ | 78/155 [04:24<04:16, 3.33s/it]
51%|βββββ | 79/155 [04:28<04:13, 3.33s/it]
52%|ββββββ | 80/155 [04:31<04:09, 3.33s/it]
{'loss': 0.5018, 'learning_rate': 3.85e-05, 'epoch': 2.57}
52%|ββββββ | 80/155 [04:31<04:09, 3.33s/it]
52%|ββββββ | 81/155 [04:34<04:06, 3.33s/it]
53%|ββββββ | 82/155 [04:38<04:03, 3.33s/it]
54%|ββββββ | 83/155 [04:41<03:59, 3.33s/it]
54%|ββββββ | 84/155 [04:44<03:56, 3.33s/it]
55%|ββββββ | 85/155 [04:48<03:52, 3.33s/it]
{'loss': 0.5036, 'learning_rate': 4.1e-05, 'epoch': 2.72}
55%|ββββββ | 85/155 [04:48<03:52, 3.33s/it]
55%|ββββββ | 86/155 [04:51<03:49, 3.33s/it]
56%|ββββββ | 87/155 [04:54<03:46, 3.33s/it]
57%|ββββββ | 88/155 [04:58<03:43, 3.33s/it]
57%|ββββββ | 89/155 [05:01<03:39, 3.33s/it]
58%|ββββββ | 90/155 [05:04<03:36, 3.33s/it]
{'loss': 0.4761, 'learning_rate': 4.35e-05, 'epoch': 2.88}
58%|ββββββ | 90/155 [05:04<03:36, 3.33s/it]
59%|ββββββ | 91/155 [05:08<03:33, 3.33s/it]
59%|ββββββ | 92/155 [05:11<03:29, 3.33s/it]
60%|ββββββ | 93/155 [05:14<03:26, 3.33s/it]
61%|ββββββ | 94/155 [05:20<04:08, 4.07s/it]
61%|βββββββ | 95/155 [05:23<03:51, 3.85s/it]
{'loss': 0.5469, 'learning_rate': 4.600000000000001e-05, 'epoch': 3.06}
61%|βββββββ | 95/155 [05:23<03:51, 3.85s/it]
62%|βββββββ | 96/155 [05:27<03:37, 3.69s/it]
63%|βββββββ | 97/155 [05:30<03:27, 3.58s/it]
63%|βββββββ | 98/155 [05:33<03:19, 3.51s/it]
64%|βββββββ | 99/155 [05:37<03:13, 3.45s/it]
65%|βββββββ | 100/155 [05:40<03:07, 3.42s/it]
{'loss': 0.4409, 'learning_rate': 4.85e-05, 'epoch': 3.22}
65%|βββββββ | 100/155 [05:40<03:07, 3.42s/it]
65%|βββββββ | 101/155 [05:43<03:03, 3.39s/it]
66%|βββββββ | 102/155 [05:47<02:58, 3.37s/it]
66%|βββββββ | 103/155 [05:50<02:54, 3.36s/it]
67%|βββββββ | 104/155 [05:53<02:50, 3.35s/it]
68%|βββββββ | 105/155 [05:57<02:46, 3.34s/it]
{'loss': 0.4152, 'learning_rate': 4.8181818181818186e-05, 'epoch': 3.38}
68%|βββββββ | 105/155 [05:57<02:46, 3.34s/it]
68%|βββββββ | 106/155 [06:00<02:43, 3.34s/it]
69%|βββββββ | 107/155 [06:03<02:39, 3.33s/it]
70%|βββββββ | 108/155 [06:07<02:36, 3.33s/it]
70%|βββββββ | 109/155 [06:10<02:33, 3.33s/it]
71%|βββββββ | 110/155 [06:13<02:29, 3.33s/it]
{'loss': 0.4137, 'learning_rate': 4.3636363636363636e-05, 'epoch': 3.54}
71%|βββββββ | 110/155 [06:13<02:29, 3.33s/it]
72%|ββββββββ | 111/155 [06:17<02:26, 3.33s/it]
72%|ββββββββ | 112/155 [06:20<02:23, 3.33s/it]
73%|ββββββββ | 113/155 [06:23<02:19, 3.33s/it]
74%|ββββββββ | 114/155 [06:27<02:16, 3.33s/it]
74%|ββββββββ | 115/155 [06:30<02:13, 3.33s/it]
{'loss': 0.3899, 'learning_rate': 3.909090909090909e-05, 'epoch': 3.69}
74%|ββββββββ | 115/155 [06:30<02:13, 3.33s/it]
75%|ββββββββ | 116/155 [06:33<02:09, 3.33s/it]
75%|ββββββββ | 117/155 [06:37<02:06, 3.33s/it]
76%|ββββββββ | 118/155 [06:40<02:03, 3.33s/it]
77%|ββββββββ | 119/155 [06:43<01:59, 3.33s/it]
77%|ββββββββ | 120/155 [06:47<01:56, 3.33s/it]
{'loss': 0.3748, 'learning_rate': 3.454545454545455e-05, 'epoch': 3.85}
77%|ββββββββ | 120/155 [06:47<01:56, 3.33s/it]
78%|ββββββββ | 121/155 [06:50<01:53, 3.33s/it]
79%|ββββββββ | 122/155 [06:53<01:49, 3.33s/it]
79%|ββββββββ | 123/155 [06:57<01:46, 3.32s/it]
80%|ββββββββ | 124/155 [07:00<01:43, 3.32s/it]
81%|ββββββββ | 125/155 [07:06<02:02, 4.07s/it]
{'loss': 0.4015, 'learning_rate': 3e-05, 'epoch': 4.03}
81%|ββββββββ | 125/155 [07:06<02:02, 4.07s/it]
81%|βββββββββ | 126/155 [07:09<01:51, 3.85s/it]
82%|βββββββββ | 127/155 [07:12<01:43, 3.69s/it]
83%|βββββββββ | 128/155 [07:16<01:36, 3.58s/it]
83%|βββββββββ | 129/155 [07:19<01:31, 3.50s/it]
84%|βββββββββ | 130/155 [07:22<01:26, 3.45s/it]
{'loss': 0.36, 'learning_rate': 2.5454545454545454e-05, 'epoch': 4.19}
84%|βββββββββ | 130/155 [07:22<01:26, 3.45s/it]
85%|βββββββββ | 131/155 [07:26<01:21, 3.41s/it]
85%|βββββββββ | 132/155 [07:29<01:17, 3.39s/it]
86%|βββββββββ | 133/155 [07:32<01:14, 3.37s/it]
86%|βββββββββ | 134/155 [07:36<01:10, 3.36s/it]
87%|βββββββββ | 135/155 [07:39<01:06, 3.35s/it]
{'loss': 0.3299, 'learning_rate': 2.090909090909091e-05, 'epoch': 4.35}
87%|βββββββββ | 135/155 [07:39<01:06, 3.35s/it]
88%|βββββββββ | 136/155 [07:42<01:03, 3.34s/it]
88%|βββββββββ | 137/155 [07:46<01:00, 3.33s/it]
89%|βββββββββ | 138/155 [07:49<00:56, 3.33s/it]
90%|βββββββββ | 139/155 [07:52<00:53, 3.33s/it]
90%|βββββββββ | 140/155 [07:56<00:49, 3.33s/it]
{'loss': 0.3332, 'learning_rate': 1.6363636363636366e-05, 'epoch': 4.5}
90%|βββββββββ | 140/155 [07:56<00:49, 3.33s/it]
91%|βββββββββ | 141/155 [07:59<00:46, 3.33s/it]
92%|ββββββββββ| 142/155 [08:02<00:43, 3.32s/it]
92%|ββββββββββ| 143/155 [08:06<00:39, 3.32s/it]
93%|ββββββββββ| 144/155 [08:09<00:36, 3.32s/it]
94%|ββββββββββ| 145/155 [08:12<00:33, 3.32s/it]
{'loss': 0.307, 'learning_rate': 1.1818181818181819e-05, 'epoch': 4.66}
94%|ββββββββββ| 145/155 [08:12<00:33, 3.32s/it]
94%|ββββββββββ| 146/155 [08:16<00:29, 3.32s/it]
95%|ββββββββββ| 147/155 [08:19<00:26, 3.32s/it]
95%|ββββββββββ| 148/155 [08:22<00:23, 3.32s/it]
96%|ββββββββββ| 149/155 [08:26<00:19, 3.33s/it]
97%|ββββββββββ| 150/155 [08:29<00:16, 3.33s/it]
{'loss': 0.2973, 'learning_rate': 7.272727272727272e-06, 'epoch': 4.82}
97%|ββββββββββ| 150/155 [08:29<00:16, 3.33s/it]
97%|ββββββββββ| 151/155 [08:32<00:13, 3.33s/it]
98%|ββββββββββ| 152/155 [08:36<00:09, 3.33s/it]
99%|ββββββββββ| 153/155 [08:39<00:06, 3.33s/it]
99%|ββββββββββ| 154/155 [08:42<00:03, 3.33s/it]
100%|ββββββββββ| 155/155 [08:46<00:00, 3.33s/it]
{'loss': 0.3096, 'learning_rate': 2.7272727272727272e-06, 'epoch': 4.98}
100%|ββββββββββ| 155/155 [08:46<00:00, 3.33s/it]
{'train_runtime': 526.0281, 'train_samples_per_second': 14.543, 'train_steps_per_second': 0.295, 'train_loss': 3.648434014474192, 'epoch': 4.98}
100%|ββββββββββ| 155/155 [08:46<00:00, 3.33s/it]
100%|ββββββββββ| 155/155 [08:46<00:00, 3.39s/it]
Time: 526.03
Samples/second: 14.54
GPU memory occupied: 81547 MB.
|