0xsuid's picture
Upload experiments/2022-10-19-515cf3b9155fd406d5067b25b7a969d2fc7be8e238d63667d772142982e8e3ff with huggingface_hub
ff29165
nohup: ignoring input
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
No config specified, defaulting to: apps/all
Found cached dataset apps (/home/user/.cache/huggingface/datasets/codeparrot___apps/all/0.0.0/04ac807715d07d6e5cc580f59cdc8213cd7dc4529d0bb819cca72c9f8e8c1aa5)
Max length: 2048
PyTorch: setting up devices
The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).
GPU memory occupied: 2667 MB.
0%| | 0/155 [00:00<?, ?it/s] 1%| | 1/155 [00:03<09:53, 3.85s/it] {'loss': 28.4696, 'learning_rate': 0.0, 'epoch': 0.03}
1%| | 1/155 [00:03<09:53, 3.85s/it] 1%|▏ | 2/155 [00:07<08:58, 3.52s/it] 2%|▏ | 3/155 [00:10<08:38, 3.41s/it] 3%|β–Ž | 4/155 [00:13<08:29, 3.37s/it] 3%|β–Ž | 5/155 [00:17<08:22, 3.35s/it] {'loss': 26.9592, 'learning_rate': 1.0000000000000002e-06, 'epoch': 0.16}
3%|β–Ž | 5/155 [00:17<08:22, 3.35s/it] 4%|▍ | 6/155 [00:20<08:16, 3.33s/it] 5%|▍ | 7/155 [00:23<08:12, 3.33s/it] 5%|β–Œ | 8/155 [00:26<08:08, 3.32s/it] 6%|β–Œ | 9/155 [00:30<08:04, 3.32s/it] 6%|β–‹ | 10/155 [00:33<08:01, 3.32s/it] {'loss': 24.5245, 'learning_rate': 3.5000000000000004e-06, 'epoch': 0.31}
6%|β–‹ | 10/155 [00:33<08:01, 3.32s/it] 7%|β–‹ | 11/155 [00:36<07:57, 3.32s/it] 8%|β–Š | 12/155 [00:40<07:54, 3.32s/it] 8%|β–Š | 13/155 [00:43<07:51, 3.32s/it] 9%|β–‰ | 14/155 [00:46<07:48, 3.32s/it] 10%|β–‰ | 15/155 [00:50<07:45, 3.32s/it] {'loss': 24.7316, 'learning_rate': 6e-06, 'epoch': 0.47}
10%|β–‰ | 15/155 [00:50<07:45, 3.32s/it] 10%|β–ˆ | 16/155 [00:53<07:41, 3.32s/it] 11%|β–ˆ | 17/155 [00:56<07:38, 3.32s/it] 12%|β–ˆβ– | 18/155 [01:00<07:35, 3.32s/it] 12%|β–ˆβ– | 19/155 [01:03<07:32, 3.32s/it] 13%|β–ˆβ–Ž | 20/155 [01:06<07:28, 3.32s/it] {'loss': 16.8561, 'learning_rate': 8.500000000000002e-06, 'epoch': 0.63}
13%|β–ˆβ–Ž | 20/155 [01:06<07:28, 3.32s/it] 14%|β–ˆβ–Ž | 21/155 [01:10<07:25, 3.32s/it] 14%|β–ˆβ– | 22/155 [01:13<07:22, 3.33s/it] 15%|β–ˆβ– | 23/155 [01:16<07:19, 3.33s/it] 15%|β–ˆβ–Œ | 24/155 [01:20<07:15, 3.33s/it] 16%|β–ˆβ–Œ | 25/155 [01:23<07:12, 3.33s/it] {'loss': 5.1585, 'learning_rate': 1.1000000000000001e-05, 'epoch': 0.79}
16%|β–ˆβ–Œ | 25/155 [01:23<07:12, 3.33s/it] 17%|β–ˆβ–‹ | 26/155 [01:26<07:09, 3.33s/it] 17%|β–ˆβ–‹ | 27/155 [01:30<07:06, 3.33s/it] 18%|β–ˆβ–Š | 28/155 [01:33<07:02, 3.33s/it] 19%|β–ˆβ–Š | 29/155 [01:36<06:59, 3.33s/it] 19%|β–ˆβ–‰ | 30/155 [01:40<06:56, 3.33s/it] {'loss': 1.6063, 'learning_rate': 1.3500000000000001e-05, 'epoch': 0.94}
19%|β–ˆβ–‰ | 30/155 [01:40<06:56, 3.33s/it] 20%|β–ˆβ–ˆ | 31/155 [01:43<06:53, 3.33s/it] 21%|β–ˆβ–ˆ | 32/155 [01:49<08:21, 4.08s/it] 21%|β–ˆβ–ˆβ– | 33/155 [01:52<07:50, 3.85s/it] 22%|β–ˆβ–ˆβ– | 34/155 [01:55<07:27, 3.70s/it] 23%|β–ˆβ–ˆβ–Ž | 35/155 [01:59<07:10, 3.59s/it] {'loss': 1.1341, 'learning_rate': 1.6000000000000003e-05, 'epoch': 1.13}
23%|β–ˆβ–ˆβ–Ž | 35/155 [01:59<07:10, 3.59s/it] 23%|β–ˆβ–ˆβ–Ž | 36/155 [02:02<06:57, 3.51s/it] 24%|β–ˆβ–ˆβ– | 37/155 [02:05<06:48, 3.46s/it] 25%|β–ˆβ–ˆβ– | 38/155 [02:09<06:40, 3.42s/it] 25%|β–ˆβ–ˆβ–Œ | 39/155 [02:12<06:33, 3.39s/it] 26%|β–ˆβ–ˆβ–Œ | 40/155 [02:15<06:27, 3.37s/it] {'loss': 0.8484, 'learning_rate': 1.85e-05, 'epoch': 1.28}
26%|β–ˆβ–ˆβ–Œ | 40/155 [02:15<06:27, 3.37s/it] 26%|β–ˆβ–ˆβ–‹ | 41/155 [02:19<06:23, 3.36s/it] 27%|β–ˆβ–ˆβ–‹ | 42/155 [02:22<06:18, 3.35s/it] 28%|β–ˆβ–ˆβ–Š | 43/155 [02:25<06:14, 3.34s/it] 28%|β–ˆβ–ˆβ–Š | 44/155 [02:29<06:10, 3.34s/it] 29%|β–ˆβ–ˆβ–‰ | 45/155 [02:32<06:06, 3.34s/it] {'loss': 0.777, 'learning_rate': 2.1e-05, 'epoch': 1.44}
29%|β–ˆβ–ˆβ–‰ | 45/155 [02:32<06:06, 3.34s/it] 30%|β–ˆβ–ˆβ–‰ | 46/155 [02:35<06:03, 3.33s/it] 30%|β–ˆβ–ˆβ–ˆ | 47/155 [02:39<05:59, 3.33s/it] 31%|β–ˆβ–ˆβ–ˆ | 48/155 [02:42<05:56, 3.33s/it] 32%|β–ˆβ–ˆβ–ˆβ– | 49/155 [02:45<05:53, 3.33s/it] 32%|β–ˆβ–ˆβ–ˆβ– | 50/155 [02:49<05:49, 3.33s/it] {'loss': 0.712, 'learning_rate': 2.35e-05, 'epoch': 1.6}
32%|β–ˆβ–ˆβ–ˆβ– | 50/155 [02:49<05:49, 3.33s/it] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 51/155 [02:52<05:46, 3.33s/it] 34%|β–ˆβ–ˆβ–ˆβ–Ž | 52/155 [02:55<05:43, 3.33s/it] 34%|β–ˆβ–ˆβ–ˆβ– | 53/155 [02:59<05:39, 3.33s/it] 35%|β–ˆβ–ˆβ–ˆβ– | 54/155 [03:02<05:36, 3.33s/it] 35%|β–ˆβ–ˆβ–ˆβ–Œ | 55/155 [03:05<05:33, 3.33s/it] {'loss': 0.6714, 'learning_rate': 2.6000000000000002e-05, 'epoch': 1.76}
35%|β–ˆβ–ˆβ–ˆβ–Œ | 55/155 [03:05<05:33, 3.33s/it] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 56/155 [03:09<05:29, 3.33s/it] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 57/155 [03:12<05:26, 3.33s/it] 37%|β–ˆβ–ˆβ–ˆβ–‹ | 58/155 [03:15<05:22, 3.33s/it] 38%|β–ˆβ–ˆβ–ˆβ–Š | 59/155 [03:19<05:19, 3.33s/it] 39%|β–ˆβ–ˆβ–ˆβ–Š | 60/155 [03:22<05:16, 3.33s/it] {'loss': 0.6215, 'learning_rate': 2.8499999999999998e-05, 'epoch': 1.91}
39%|β–ˆβ–ˆβ–ˆβ–Š | 60/155 [03:22<05:16, 3.33s/it] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 61/155 [03:25<05:12, 3.33s/it] 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 62/155 [03:29<05:09, 3.33s/it] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 63/155 [03:34<06:14, 4.07s/it] 41%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 64/155 [03:38<05:50, 3.85s/it] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 65/155 [03:41<05:32, 3.69s/it] {'loss': 0.6794, 'learning_rate': 3.1e-05, 'epoch': 2.09}
42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 65/155 [03:41<05:32, 3.69s/it] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 66/155 [03:44<05:19, 3.59s/it] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 67/155 [03:48<05:08, 3.51s/it] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 68/155 [03:51<05:00, 3.45s/it] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 69/155 [03:54<04:53, 3.42s/it] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 70/155 [03:58<04:48, 3.39s/it] {'loss': 0.5588, 'learning_rate': 3.35e-05, 'epoch': 2.25}
45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 70/155 [03:58<04:48, 3.39s/it] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 71/155 [04:01<04:43, 3.37s/it] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 72/155 [04:04<04:38, 3.36s/it] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 73/155 [04:08<04:34, 3.35s/it] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 74/155 [04:11<04:30, 3.34s/it] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 75/155 [04:14<04:27, 3.34s/it] {'loss': 0.5589, 'learning_rate': 3.6e-05, 'epoch': 2.41}
48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 75/155 [04:14<04:27, 3.34s/it] 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 76/155 [04:18<04:23, 3.34s/it] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 77/155 [04:21<04:20, 3.33s/it] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 78/155 [04:24<04:16, 3.33s/it] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 79/155 [04:28<04:13, 3.33s/it] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 80/155 [04:31<04:09, 3.33s/it] {'loss': 0.5018, 'learning_rate': 3.85e-05, 'epoch': 2.57}
52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 80/155 [04:31<04:09, 3.33s/it] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 81/155 [04:34<04:06, 3.33s/it] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 82/155 [04:38<04:03, 3.33s/it] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 83/155 [04:41<03:59, 3.33s/it] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 84/155 [04:44<03:56, 3.33s/it] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 85/155 [04:48<03:52, 3.33s/it] {'loss': 0.5036, 'learning_rate': 4.1e-05, 'epoch': 2.72}
55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 85/155 [04:48<03:52, 3.33s/it] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 86/155 [04:51<03:49, 3.33s/it] 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 87/155 [04:54<03:46, 3.33s/it] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 88/155 [04:58<03:43, 3.33s/it] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 89/155 [05:01<03:39, 3.33s/it] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 90/155 [05:04<03:36, 3.33s/it] {'loss': 0.4761, 'learning_rate': 4.35e-05, 'epoch': 2.88}
58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 90/155 [05:04<03:36, 3.33s/it] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 91/155 [05:08<03:33, 3.33s/it] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 92/155 [05:11<03:29, 3.33s/it] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 93/155 [05:14<03:26, 3.33s/it] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 94/155 [05:20<04:08, 4.07s/it] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 95/155 [05:23<03:51, 3.85s/it] {'loss': 0.5469, 'learning_rate': 4.600000000000001e-05, 'epoch': 3.06}
61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 95/155 [05:23<03:51, 3.85s/it] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 96/155 [05:27<03:37, 3.69s/it] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 97/155 [05:30<03:27, 3.58s/it] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 98/155 [05:33<03:19, 3.51s/it] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 99/155 [05:37<03:13, 3.45s/it] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 100/155 [05:40<03:07, 3.42s/it] {'loss': 0.4409, 'learning_rate': 4.85e-05, 'epoch': 3.22}
65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 100/155 [05:40<03:07, 3.42s/it] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 101/155 [05:43<03:03, 3.39s/it] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 102/155 [05:47<02:58, 3.37s/it] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 103/155 [05:50<02:54, 3.36s/it] 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 104/155 [05:53<02:50, 3.35s/it] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 105/155 [05:57<02:46, 3.34s/it] {'loss': 0.4152, 'learning_rate': 4.8181818181818186e-05, 'epoch': 3.38}
68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 105/155 [05:57<02:46, 3.34s/it] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 106/155 [06:00<02:43, 3.34s/it] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 107/155 [06:03<02:39, 3.33s/it] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 108/155 [06:07<02:36, 3.33s/it] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 109/155 [06:10<02:33, 3.33s/it] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 110/155 [06:13<02:29, 3.33s/it] {'loss': 0.4137, 'learning_rate': 4.3636363636363636e-05, 'epoch': 3.54}
71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 110/155 [06:13<02:29, 3.33s/it] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 111/155 [06:17<02:26, 3.33s/it] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 112/155 [06:20<02:23, 3.33s/it] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 113/155 [06:23<02:19, 3.33s/it] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 114/155 [06:27<02:16, 3.33s/it] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 115/155 [06:30<02:13, 3.33s/it] {'loss': 0.3899, 'learning_rate': 3.909090909090909e-05, 'epoch': 3.69}
74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 115/155 [06:30<02:13, 3.33s/it] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 116/155 [06:33<02:09, 3.33s/it] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 117/155 [06:37<02:06, 3.33s/it] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 118/155 [06:40<02:03, 3.33s/it] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 119/155 [06:43<01:59, 3.33s/it] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 120/155 [06:47<01:56, 3.33s/it] {'loss': 0.3748, 'learning_rate': 3.454545454545455e-05, 'epoch': 3.85}
77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 120/155 [06:47<01:56, 3.33s/it] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 121/155 [06:50<01:53, 3.33s/it] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 122/155 [06:53<01:49, 3.33s/it] 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 123/155 [06:57<01:46, 3.32s/it] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 124/155 [07:00<01:43, 3.32s/it] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 125/155 [07:06<02:02, 4.07s/it] {'loss': 0.4015, 'learning_rate': 3e-05, 'epoch': 4.03}
81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 125/155 [07:06<02:02, 4.07s/it] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 126/155 [07:09<01:51, 3.85s/it] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 127/155 [07:12<01:43, 3.69s/it] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 128/155 [07:16<01:36, 3.58s/it] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 129/155 [07:19<01:31, 3.50s/it] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 130/155 [07:22<01:26, 3.45s/it] {'loss': 0.36, 'learning_rate': 2.5454545454545454e-05, 'epoch': 4.19}
84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 130/155 [07:22<01:26, 3.45s/it] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 131/155 [07:26<01:21, 3.41s/it] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 132/155 [07:29<01:17, 3.39s/it] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 133/155 [07:32<01:14, 3.37s/it] 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 134/155 [07:36<01:10, 3.36s/it] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 135/155 [07:39<01:06, 3.35s/it] {'loss': 0.3299, 'learning_rate': 2.090909090909091e-05, 'epoch': 4.35}
87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 135/155 [07:39<01:06, 3.35s/it] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 136/155 [07:42<01:03, 3.34s/it] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 137/155 [07:46<01:00, 3.33s/it] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 138/155 [07:49<00:56, 3.33s/it] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 139/155 [07:52<00:53, 3.33s/it] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 140/155 [07:56<00:49, 3.33s/it] {'loss': 0.3332, 'learning_rate': 1.6363636363636366e-05, 'epoch': 4.5}
90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 140/155 [07:56<00:49, 3.33s/it] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 141/155 [07:59<00:46, 3.33s/it] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 142/155 [08:02<00:43, 3.32s/it] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 143/155 [08:06<00:39, 3.32s/it] 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 144/155 [08:09<00:36, 3.32s/it] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 145/155 [08:12<00:33, 3.32s/it] {'loss': 0.307, 'learning_rate': 1.1818181818181819e-05, 'epoch': 4.66}
94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 145/155 [08:12<00:33, 3.32s/it] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 146/155 [08:16<00:29, 3.32s/it] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 147/155 [08:19<00:26, 3.32s/it] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 148/155 [08:22<00:23, 3.32s/it] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 149/155 [08:26<00:19, 3.33s/it] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 150/155 [08:29<00:16, 3.33s/it] {'loss': 0.2973, 'learning_rate': 7.272727272727272e-06, 'epoch': 4.82}
97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 150/155 [08:29<00:16, 3.33s/it] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 151/155 [08:32<00:13, 3.33s/it] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 152/155 [08:36<00:09, 3.33s/it] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 153/155 [08:39<00:06, 3.33s/it] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 154/155 [08:42<00:03, 3.33s/it] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 155/155 [08:46<00:00, 3.33s/it] {'loss': 0.3096, 'learning_rate': 2.7272727272727272e-06, 'epoch': 4.98}
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 155/155 [08:46<00:00, 3.33s/it] {'train_runtime': 526.0281, 'train_samples_per_second': 14.543, 'train_steps_per_second': 0.295, 'train_loss': 3.648434014474192, 'epoch': 4.98}
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 155/155 [08:46<00:00, 3.33s/it] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 155/155 [08:46<00:00, 3.39s/it]
Time: 526.03
Samples/second: 14.54
GPU memory occupied: 81547 MB.