File size: 16,234 Bytes
ff29165
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
nohup: ignoring input
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
No config specified, defaulting to: apps/all
Found cached dataset apps (/home/user/.cache/huggingface/datasets/codeparrot___apps/all/0.0.0/04ac807715d07d6e5cc580f59cdc8213cd7dc4529d0bb819cca72c9f8e8c1aa5)
Max length: 2048
PyTorch: setting up devices
The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).
GPU memory occupied: 2667 MB.

  0%|          | 0/155 [00:00<?, ?it/s]
  1%|          | 1/155 [00:03<09:53,  3.85s/it]
                                               
{'loss': 28.4696, 'learning_rate': 0.0, 'epoch': 0.03}

  1%|          | 1/155 [00:03<09:53,  3.85s/it]
  1%|▏         | 2/155 [00:07<08:58,  3.52s/it]
  2%|▏         | 3/155 [00:10<08:38,  3.41s/it]
  3%|β–Ž         | 4/155 [00:13<08:29,  3.37s/it]
  3%|β–Ž         | 5/155 [00:17<08:22,  3.35s/it]
                                               
{'loss': 26.9592, 'learning_rate': 1.0000000000000002e-06, 'epoch': 0.16}

  3%|β–Ž         | 5/155 [00:17<08:22,  3.35s/it]
  4%|▍         | 6/155 [00:20<08:16,  3.33s/it]
  5%|▍         | 7/155 [00:23<08:12,  3.33s/it]
  5%|β–Œ         | 8/155 [00:26<08:08,  3.32s/it]
  6%|β–Œ         | 9/155 [00:30<08:04,  3.32s/it]
  6%|β–‹         | 10/155 [00:33<08:01,  3.32s/it]
                                                
{'loss': 24.5245, 'learning_rate': 3.5000000000000004e-06, 'epoch': 0.31}

  6%|β–‹         | 10/155 [00:33<08:01,  3.32s/it]
  7%|β–‹         | 11/155 [00:36<07:57,  3.32s/it]
  8%|β–Š         | 12/155 [00:40<07:54,  3.32s/it]
  8%|β–Š         | 13/155 [00:43<07:51,  3.32s/it]
  9%|β–‰         | 14/155 [00:46<07:48,  3.32s/it]
 10%|β–‰         | 15/155 [00:50<07:45,  3.32s/it]
                                                
{'loss': 24.7316, 'learning_rate': 6e-06, 'epoch': 0.47}

 10%|β–‰         | 15/155 [00:50<07:45,  3.32s/it]
 10%|β–ˆ         | 16/155 [00:53<07:41,  3.32s/it]
 11%|β–ˆ         | 17/155 [00:56<07:38,  3.32s/it]
 12%|β–ˆβ–        | 18/155 [01:00<07:35,  3.32s/it]
 12%|β–ˆβ–        | 19/155 [01:03<07:32,  3.32s/it]
 13%|β–ˆβ–Ž        | 20/155 [01:06<07:28,  3.32s/it]
                                                
{'loss': 16.8561, 'learning_rate': 8.500000000000002e-06, 'epoch': 0.63}

 13%|β–ˆβ–Ž        | 20/155 [01:06<07:28,  3.32s/it]
 14%|β–ˆβ–Ž        | 21/155 [01:10<07:25,  3.32s/it]
 14%|β–ˆβ–        | 22/155 [01:13<07:22,  3.33s/it]
 15%|β–ˆβ–        | 23/155 [01:16<07:19,  3.33s/it]
 15%|β–ˆβ–Œ        | 24/155 [01:20<07:15,  3.33s/it]
 16%|β–ˆβ–Œ        | 25/155 [01:23<07:12,  3.33s/it]
                                                
{'loss': 5.1585, 'learning_rate': 1.1000000000000001e-05, 'epoch': 0.79}

 16%|β–ˆβ–Œ        | 25/155 [01:23<07:12,  3.33s/it]
 17%|β–ˆβ–‹        | 26/155 [01:26<07:09,  3.33s/it]
 17%|β–ˆβ–‹        | 27/155 [01:30<07:06,  3.33s/it]
 18%|β–ˆβ–Š        | 28/155 [01:33<07:02,  3.33s/it]
 19%|β–ˆβ–Š        | 29/155 [01:36<06:59,  3.33s/it]
 19%|β–ˆβ–‰        | 30/155 [01:40<06:56,  3.33s/it]
                                                
{'loss': 1.6063, 'learning_rate': 1.3500000000000001e-05, 'epoch': 0.94}

 19%|β–ˆβ–‰        | 30/155 [01:40<06:56,  3.33s/it]
 20%|β–ˆβ–ˆ        | 31/155 [01:43<06:53,  3.33s/it]
 21%|β–ˆβ–ˆ        | 32/155 [01:49<08:21,  4.08s/it]
 21%|β–ˆβ–ˆβ–       | 33/155 [01:52<07:50,  3.85s/it]
 22%|β–ˆβ–ˆβ–       | 34/155 [01:55<07:27,  3.70s/it]
 23%|β–ˆβ–ˆβ–Ž       | 35/155 [01:59<07:10,  3.59s/it]
                                                
{'loss': 1.1341, 'learning_rate': 1.6000000000000003e-05, 'epoch': 1.13}

 23%|β–ˆβ–ˆβ–Ž       | 35/155 [01:59<07:10,  3.59s/it]
 23%|β–ˆβ–ˆβ–Ž       | 36/155 [02:02<06:57,  3.51s/it]
 24%|β–ˆβ–ˆβ–       | 37/155 [02:05<06:48,  3.46s/it]
 25%|β–ˆβ–ˆβ–       | 38/155 [02:09<06:40,  3.42s/it]
 25%|β–ˆβ–ˆβ–Œ       | 39/155 [02:12<06:33,  3.39s/it]
 26%|β–ˆβ–ˆβ–Œ       | 40/155 [02:15<06:27,  3.37s/it]
                                                
{'loss': 0.8484, 'learning_rate': 1.85e-05, 'epoch': 1.28}

 26%|β–ˆβ–ˆβ–Œ       | 40/155 [02:15<06:27,  3.37s/it]
 26%|β–ˆβ–ˆβ–‹       | 41/155 [02:19<06:23,  3.36s/it]
 27%|β–ˆβ–ˆβ–‹       | 42/155 [02:22<06:18,  3.35s/it]
 28%|β–ˆβ–ˆβ–Š       | 43/155 [02:25<06:14,  3.34s/it]
 28%|β–ˆβ–ˆβ–Š       | 44/155 [02:29<06:10,  3.34s/it]
 29%|β–ˆβ–ˆβ–‰       | 45/155 [02:32<06:06,  3.34s/it]
                                                
{'loss': 0.777, 'learning_rate': 2.1e-05, 'epoch': 1.44}

 29%|β–ˆβ–ˆβ–‰       | 45/155 [02:32<06:06,  3.34s/it]
 30%|β–ˆβ–ˆβ–‰       | 46/155 [02:35<06:03,  3.33s/it]
 30%|β–ˆβ–ˆβ–ˆ       | 47/155 [02:39<05:59,  3.33s/it]
 31%|β–ˆβ–ˆβ–ˆ       | 48/155 [02:42<05:56,  3.33s/it]
 32%|β–ˆβ–ˆβ–ˆβ–      | 49/155 [02:45<05:53,  3.33s/it]
 32%|β–ˆβ–ˆβ–ˆβ–      | 50/155 [02:49<05:49,  3.33s/it]
                                                
{'loss': 0.712, 'learning_rate': 2.35e-05, 'epoch': 1.6}

 32%|β–ˆβ–ˆβ–ˆβ–      | 50/155 [02:49<05:49,  3.33s/it]
 33%|β–ˆβ–ˆβ–ˆβ–Ž      | 51/155 [02:52<05:46,  3.33s/it]
 34%|β–ˆβ–ˆβ–ˆβ–Ž      | 52/155 [02:55<05:43,  3.33s/it]
 34%|β–ˆβ–ˆβ–ˆβ–      | 53/155 [02:59<05:39,  3.33s/it]
 35%|β–ˆβ–ˆβ–ˆβ–      | 54/155 [03:02<05:36,  3.33s/it]
 35%|β–ˆβ–ˆβ–ˆβ–Œ      | 55/155 [03:05<05:33,  3.33s/it]
                                                
{'loss': 0.6714, 'learning_rate': 2.6000000000000002e-05, 'epoch': 1.76}

 35%|β–ˆβ–ˆβ–ˆβ–Œ      | 55/155 [03:05<05:33,  3.33s/it]
 36%|β–ˆβ–ˆβ–ˆβ–Œ      | 56/155 [03:09<05:29,  3.33s/it]
 37%|β–ˆβ–ˆβ–ˆβ–‹      | 57/155 [03:12<05:26,  3.33s/it]
 37%|β–ˆβ–ˆβ–ˆβ–‹      | 58/155 [03:15<05:22,  3.33s/it]
 38%|β–ˆβ–ˆβ–ˆβ–Š      | 59/155 [03:19<05:19,  3.33s/it]
 39%|β–ˆβ–ˆβ–ˆβ–Š      | 60/155 [03:22<05:16,  3.33s/it]
                                                
{'loss': 0.6215, 'learning_rate': 2.8499999999999998e-05, 'epoch': 1.91}

 39%|β–ˆβ–ˆβ–ˆβ–Š      | 60/155 [03:22<05:16,  3.33s/it]
 39%|β–ˆβ–ˆβ–ˆβ–‰      | 61/155 [03:25<05:12,  3.33s/it]
 40%|β–ˆβ–ˆβ–ˆβ–ˆ      | 62/155 [03:29<05:09,  3.33s/it]
 41%|β–ˆβ–ˆβ–ˆβ–ˆ      | 63/155 [03:34<06:14,  4.07s/it]
 41%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 64/155 [03:38<05:50,  3.85s/it]
 42%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 65/155 [03:41<05:32,  3.69s/it]
                                                
{'loss': 0.6794, 'learning_rate': 3.1e-05, 'epoch': 2.09}

 42%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 65/155 [03:41<05:32,  3.69s/it]
 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 66/155 [03:44<05:19,  3.59s/it]
 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 67/155 [03:48<05:08,  3.51s/it]
 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 68/155 [03:51<05:00,  3.45s/it]
 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 69/155 [03:54<04:53,  3.42s/it]
 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 70/155 [03:58<04:48,  3.39s/it]
                                                
{'loss': 0.5588, 'learning_rate': 3.35e-05, 'epoch': 2.25}

 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 70/155 [03:58<04:48,  3.39s/it]
 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 71/155 [04:01<04:43,  3.37s/it]
 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 72/155 [04:04<04:38,  3.36s/it]
 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 73/155 [04:08<04:34,  3.35s/it]
 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 74/155 [04:11<04:30,  3.34s/it]
 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 75/155 [04:14<04:27,  3.34s/it]
                                                
{'loss': 0.5589, 'learning_rate': 3.6e-05, 'epoch': 2.41}

 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 75/155 [04:14<04:27,  3.34s/it]
 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 76/155 [04:18<04:23,  3.34s/it]
 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 77/155 [04:21<04:20,  3.33s/it]
 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 78/155 [04:24<04:16,  3.33s/it]
 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 79/155 [04:28<04:13,  3.33s/it]
 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 80/155 [04:31<04:09,  3.33s/it]
                                                
{'loss': 0.5018, 'learning_rate': 3.85e-05, 'epoch': 2.57}

 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 80/155 [04:31<04:09,  3.33s/it]
 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 81/155 [04:34<04:06,  3.33s/it]
 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 82/155 [04:38<04:03,  3.33s/it]
 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 83/155 [04:41<03:59,  3.33s/it]
 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 84/155 [04:44<03:56,  3.33s/it]
 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 85/155 [04:48<03:52,  3.33s/it]
                                                
{'loss': 0.5036, 'learning_rate': 4.1e-05, 'epoch': 2.72}

 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 85/155 [04:48<03:52,  3.33s/it]
 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 86/155 [04:51<03:49,  3.33s/it]
 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 87/155 [04:54<03:46,  3.33s/it]
 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 88/155 [04:58<03:43,  3.33s/it]
 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 89/155 [05:01<03:39,  3.33s/it]
 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 90/155 [05:04<03:36,  3.33s/it]
                                                
{'loss': 0.4761, 'learning_rate': 4.35e-05, 'epoch': 2.88}

 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 90/155 [05:04<03:36,  3.33s/it]
 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 91/155 [05:08<03:33,  3.33s/it]
 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 92/155 [05:11<03:29,  3.33s/it]
 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 93/155 [05:14<03:26,  3.33s/it]
 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 94/155 [05:20<04:08,  4.07s/it]
 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 95/155 [05:23<03:51,  3.85s/it]
                                                
{'loss': 0.5469, 'learning_rate': 4.600000000000001e-05, 'epoch': 3.06}

 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 95/155 [05:23<03:51,  3.85s/it]
 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 96/155 [05:27<03:37,  3.69s/it]
 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 97/155 [05:30<03:27,  3.58s/it]
 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 98/155 [05:33<03:19,  3.51s/it]
 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 99/155 [05:37<03:13,  3.45s/it]
 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 100/155 [05:40<03:07,  3.42s/it]
                                                 
{'loss': 0.4409, 'learning_rate': 4.85e-05, 'epoch': 3.22}

 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 100/155 [05:40<03:07,  3.42s/it]
 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 101/155 [05:43<03:03,  3.39s/it]
 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 102/155 [05:47<02:58,  3.37s/it]
 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 103/155 [05:50<02:54,  3.36s/it]
 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 104/155 [05:53<02:50,  3.35s/it]
 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 105/155 [05:57<02:46,  3.34s/it]
                                                 
{'loss': 0.4152, 'learning_rate': 4.8181818181818186e-05, 'epoch': 3.38}

 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 105/155 [05:57<02:46,  3.34s/it]
 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 106/155 [06:00<02:43,  3.34s/it]
 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 107/155 [06:03<02:39,  3.33s/it]
 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 108/155 [06:07<02:36,  3.33s/it]
 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 109/155 [06:10<02:33,  3.33s/it]
 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 110/155 [06:13<02:29,  3.33s/it]
                                                 
{'loss': 0.4137, 'learning_rate': 4.3636363636363636e-05, 'epoch': 3.54}

 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 110/155 [06:13<02:29,  3.33s/it]
 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 111/155 [06:17<02:26,  3.33s/it]
 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 112/155 [06:20<02:23,  3.33s/it]
 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 113/155 [06:23<02:19,  3.33s/it]
 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 114/155 [06:27<02:16,  3.33s/it]
 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 115/155 [06:30<02:13,  3.33s/it]
                                                 
{'loss': 0.3899, 'learning_rate': 3.909090909090909e-05, 'epoch': 3.69}

 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 115/155 [06:30<02:13,  3.33s/it]
 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 116/155 [06:33<02:09,  3.33s/it]
 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 117/155 [06:37<02:06,  3.33s/it]
 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 118/155 [06:40<02:03,  3.33s/it]
 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 119/155 [06:43<01:59,  3.33s/it]
 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 120/155 [06:47<01:56,  3.33s/it]
                                                 
{'loss': 0.3748, 'learning_rate': 3.454545454545455e-05, 'epoch': 3.85}

 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 120/155 [06:47<01:56,  3.33s/it]
 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 121/155 [06:50<01:53,  3.33s/it]
 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 122/155 [06:53<01:49,  3.33s/it]
 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 123/155 [06:57<01:46,  3.32s/it]
 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 124/155 [07:00<01:43,  3.32s/it]
 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 125/155 [07:06<02:02,  4.07s/it]
                                                 
{'loss': 0.4015, 'learning_rate': 3e-05, 'epoch': 4.03}

 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 125/155 [07:06<02:02,  4.07s/it]
 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 126/155 [07:09<01:51,  3.85s/it]
 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 127/155 [07:12<01:43,  3.69s/it]
 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 128/155 [07:16<01:36,  3.58s/it]
 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 129/155 [07:19<01:31,  3.50s/it]
 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 130/155 [07:22<01:26,  3.45s/it]
                                                 
{'loss': 0.36, 'learning_rate': 2.5454545454545454e-05, 'epoch': 4.19}

 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 130/155 [07:22<01:26,  3.45s/it]
 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 131/155 [07:26<01:21,  3.41s/it]
 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 132/155 [07:29<01:17,  3.39s/it]
 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 133/155 [07:32<01:14,  3.37s/it]
 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 134/155 [07:36<01:10,  3.36s/it]
 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 135/155 [07:39<01:06,  3.35s/it]
                                                 
{'loss': 0.3299, 'learning_rate': 2.090909090909091e-05, 'epoch': 4.35}

 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 135/155 [07:39<01:06,  3.35s/it]
 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 136/155 [07:42<01:03,  3.34s/it]
 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 137/155 [07:46<01:00,  3.33s/it]
 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 138/155 [07:49<00:56,  3.33s/it]
 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 139/155 [07:52<00:53,  3.33s/it]
 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 140/155 [07:56<00:49,  3.33s/it]
                                                 
{'loss': 0.3332, 'learning_rate': 1.6363636363636366e-05, 'epoch': 4.5}

 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 140/155 [07:56<00:49,  3.33s/it]
 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 141/155 [07:59<00:46,  3.33s/it]
 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 142/155 [08:02<00:43,  3.32s/it]
 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 143/155 [08:06<00:39,  3.32s/it]
 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 144/155 [08:09<00:36,  3.32s/it]
 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 145/155 [08:12<00:33,  3.32s/it]
                                                 
{'loss': 0.307, 'learning_rate': 1.1818181818181819e-05, 'epoch': 4.66}

 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 145/155 [08:12<00:33,  3.32s/it]
 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 146/155 [08:16<00:29,  3.32s/it]
 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 147/155 [08:19<00:26,  3.32s/it]
 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 148/155 [08:22<00:23,  3.32s/it]
 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 149/155 [08:26<00:19,  3.33s/it]
 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 150/155 [08:29<00:16,  3.33s/it]
                                                 
{'loss': 0.2973, 'learning_rate': 7.272727272727272e-06, 'epoch': 4.82}

 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 150/155 [08:29<00:16,  3.33s/it]
 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 151/155 [08:32<00:13,  3.33s/it]
 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 152/155 [08:36<00:09,  3.33s/it]
 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 153/155 [08:39<00:06,  3.33s/it]
 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 154/155 [08:42<00:03,  3.33s/it]
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 155/155 [08:46<00:00,  3.33s/it]
                                                 
{'loss': 0.3096, 'learning_rate': 2.7272727272727272e-06, 'epoch': 4.98}

100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 155/155 [08:46<00:00,  3.33s/it]
                                                 
{'train_runtime': 526.0281, 'train_samples_per_second': 14.543, 'train_steps_per_second': 0.295, 'train_loss': 3.648434014474192, 'epoch': 4.98}

100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 155/155 [08:46<00:00,  3.33s/it]
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 155/155 [08:46<00:00,  3.39s/it]
Time: 526.03
Samples/second: 14.54
GPU memory occupied: 81547 MB.