[2024-08-07 04:07:32,726][Main][INFO] - Distributed environment: NO Num processes: 1 Process index: 0 Local process index: 0 Device: cuda Mixed precision type: bf16 [2024-08-07 04:07:32,726][Main][INFO] - Working directory is /workspace/nanoT5/logs/2024-08-07/04-07-32- [2024-08-07 04:12:18,030][Main][INFO] - [train] Step 50 out of 25000 | Loss --> 11.343 | Grad_l2 --> 43.017 | Weights_l2 --> 47200.114 | Lr --> 0.000 | Seconds_per_step --> 5.126 | [2024-08-07 04:13:43,886][Main][INFO] - [train] Step 100 out of 25000 | Loss --> 9.276 | Grad_l2 --> 45.242 | Weights_l2 --> 47199.978 | Lr --> 0.000 | Seconds_per_step --> 1.717 | [2024-08-07 04:15:10,676][Main][INFO] - [train] Step 150 out of 25000 | Loss --> 8.145 | Grad_l2 --> 47.689 | Weights_l2 --> 47199.838 | Lr --> 0.000 | Seconds_per_step --> 1.736 | [2024-08-07 04:16:36,244][Main][INFO] - [train] Step 200 out of 25000 | Loss --> 7.244 | Grad_l2 --> 49.128 | Weights_l2 --> 47199.701 | Lr --> 0.000 | Seconds_per_step --> 1.711 | [2024-08-07 04:18:03,337][Main][INFO] - [train] Step 250 out of 25000 | Loss --> 6.159 | Grad_l2 --> 47.661 | Weights_l2 --> 47199.569 | Lr --> 0.000 | Seconds_per_step --> 1.742 | [2024-08-07 04:19:30,382][Main][INFO] - [train] Step 300 out of 25000 | Loss --> 4.863 | Grad_l2 --> 42.692 | Weights_l2 --> 47199.433 | Lr --> 0.000 | Seconds_per_step --> 1.741 | [2024-08-07 04:20:56,228][Main][INFO] - [train] Step 350 out of 25000 | Loss --> 3.515 | Grad_l2 --> 32.667 | Weights_l2 --> 47199.297 | Lr --> 0.000 | Seconds_per_step --> 1.717 | [2024-08-07 04:22:22,722][Main][INFO] - [train] Step 400 out of 25000 | Loss --> 2.114 | Grad_l2 --> 15.583 | Weights_l2 --> 47199.164 | Lr --> 0.000 | Seconds_per_step --> 1.730 | [2024-08-07 04:23:47,989][Main][INFO] - [train] Step 450 out of 25000 | Loss --> 0.978 | Grad_l2 --> 1.009 | Weights_l2 --> 47199.033 | Lr --> 0.000 | Seconds_per_step --> 1.705 | [2024-08-07 04:25:15,079][Main][INFO] - [train] Step 500 out of 25000 | Loss --> 0.806 | Grad_l2 --> 0.498 | Weights_l2 --> 47198.897 | Lr --> 0.000 | Seconds_per_step --> 1.742 | [2024-08-07 04:26:40,081][Main][INFO] - [train] Step 550 out of 25000 | Loss --> 0.808 | Grad_l2 --> 0.696 | Weights_l2 --> 47198.757 | Lr --> 0.000 | Seconds_per_step --> 1.700 | [2024-08-07 04:28:06,532][Main][INFO] - [train] Step 600 out of 25000 | Loss --> 0.785 | Grad_l2 --> 0.455 | Weights_l2 --> 47198.620 | Lr --> 0.000 | Seconds_per_step --> 1.729 | [2024-08-07 04:29:33,537][Main][INFO] - [train] Step 650 out of 25000 | Loss --> 0.787 | Grad_l2 --> 0.733 | Weights_l2 --> 47198.484 | Lr --> 0.000 | Seconds_per_step --> 1.740 | [2024-08-07 04:30:59,366][Main][INFO] - [train] Step 700 out of 25000 | Loss --> 0.735 | Grad_l2 --> 0.463 | Weights_l2 --> 47198.347 | Lr --> 0.000 | Seconds_per_step --> 1.717 | [2024-08-07 04:32:26,571][Main][INFO] - [train] Step 750 out of 25000 | Loss --> 0.714 | Grad_l2 --> 0.298 | Weights_l2 --> 47198.207 | Lr --> 0.000 | Seconds_per_step --> 1.744 | [2024-08-07 04:33:51,938][Main][INFO] - [train] Step 800 out of 25000 | Loss --> 0.709 | Grad_l2 --> 0.358 | Weights_l2 --> 47198.070 | Lr --> 0.000 | Seconds_per_step --> 1.707 | [2024-08-07 04:35:18,887][Main][INFO] - [train] Step 850 out of 25000 | Loss --> 0.697 | Grad_l2 --> 0.314 | Weights_l2 --> 47197.933 | Lr --> 0.000 | Seconds_per_step --> 1.739 | [2024-08-07 04:36:44,507][Main][INFO] - [train] Step 900 out of 25000 | Loss --> 0.712 | Grad_l2 --> 0.621 | Weights_l2 --> 47197.796 | Lr --> 0.000 | Seconds_per_step --> 1.712 | [2024-08-07 04:38:11,744][Main][INFO] - [train] Step 950 out of 25000 | Loss --> 0.681 | Grad_l2 --> 0.380 | Weights_l2 --> 47197.660 | Lr --> 0.000 | Seconds_per_step --> 1.745 | [2024-08-07 04:39:37,561][Main][INFO] - [train] Step 1000 out of 25000 | Loss --> 0.697 | Grad_l2 --> 0.373 | Weights_l2 --> 47197.523 | Lr --> 0.000 | Seconds_per_step --> 1.716 | [2024-08-07 04:41:04,005][Main][INFO] - [train] Step 1050 out of 25000 | Loss --> 0.706 | Grad_l2 --> 0.522 | Weights_l2 --> 47197.382 | Lr --> 0.000 | Seconds_per_step --> 1.729 | [2024-08-07 04:42:29,034][Main][INFO] - [train] Step 1100 out of 25000 | Loss --> 0.685 | Grad_l2 --> 0.296 | Weights_l2 --> 47197.245 | Lr --> 0.000 | Seconds_per_step --> 1.701 | [2024-08-07 04:43:55,529][Main][INFO] - [train] Step 1150 out of 25000 | Loss --> 0.674 | Grad_l2 --> 0.331 | Weights_l2 --> 47197.109 | Lr --> 0.000 | Seconds_per_step --> 1.730 | [2024-08-07 04:45:20,774][Main][INFO] - [train] Step 1200 out of 25000 | Loss --> 0.667 | Grad_l2 --> 0.590 | Weights_l2 --> 47196.972 | Lr --> 0.000 | Seconds_per_step --> 1.705 | [2024-08-07 04:46:47,532][Main][INFO] - [train] Step 1250 out of 25000 | Loss --> 0.665 | Grad_l2 --> 0.349 | Weights_l2 --> 47196.835 | Lr --> 0.000 | Seconds_per_step --> 1.735 | [2024-08-07 04:48:14,549][Main][INFO] - [train] Step 1300 out of 25000 | Loss --> 0.663 | Grad_l2 --> 0.495 | Weights_l2 --> 47196.698 | Lr --> 0.000 | Seconds_per_step --> 1.740 | [2024-08-07 04:49:40,267][Main][INFO] - [train] Step 1350 out of 25000 | Loss --> 0.646 | Grad_l2 --> 0.261 | Weights_l2 --> 47196.557 | Lr --> 0.000 | Seconds_per_step --> 1.714 | [2024-08-07 04:51:07,119][Main][INFO] - [train] Step 1400 out of 25000 | Loss --> 0.626 | Grad_l2 --> 0.245 | Weights_l2 --> 47196.420 | Lr --> 0.000 | Seconds_per_step --> 1.737 | [2024-08-07 04:52:32,692][Main][INFO] - [train] Step 1450 out of 25000 | Loss --> 0.642 | Grad_l2 --> 0.329 | Weights_l2 --> 47196.283 | Lr --> 0.000 | Seconds_per_step --> 1.711 | [2024-08-07 04:53:59,630][Main][INFO] - [train] Step 1500 out of 25000 | Loss --> 0.641 | Grad_l2 --> 0.347 | Weights_l2 --> 47196.146 | Lr --> 0.000 | Seconds_per_step --> 1.739 | [2024-08-07 04:55:25,305][Main][INFO] - [train] Step 1550 out of 25000 | Loss --> 0.645 | Grad_l2 --> 0.239 | Weights_l2 --> 47196.009 | Lr --> 0.000 | Seconds_per_step --> 1.713 | [2024-08-07 04:56:52,437][Main][INFO] - [train] Step 1600 out of 25000 | Loss --> 0.670 | Grad_l2 --> 0.247 | Weights_l2 --> 47195.869 | Lr --> 0.000 | Seconds_per_step --> 1.743 | [2024-08-07 04:58:18,243][Main][INFO] - [train] Step 1650 out of 25000 | Loss --> 0.630 | Grad_l2 --> 0.272 | Weights_l2 --> 47195.736 | Lr --> 0.000 | Seconds_per_step --> 1.716 | [2024-08-07 04:59:45,425][Main][INFO] - [train] Step 1700 out of 25000 | Loss --> 0.639 | Grad_l2 --> 0.281 | Weights_l2 --> 47195.599 | Lr --> 0.000 | Seconds_per_step --> 1.744 | [2024-08-07 05:01:11,208][Main][INFO] - [train] Step 1750 out of 25000 | Loss --> 0.640 | Grad_l2 --> 0.281 | Weights_l2 --> 47195.462 | Lr --> 0.000 | Seconds_per_step --> 1.716 | [2024-08-07 05:02:38,396][Main][INFO] - [train] Step 1800 out of 25000 | Loss --> 0.626 | Grad_l2 --> 0.226 | Weights_l2 --> 47195.321 | Lr --> 0.000 | Seconds_per_step --> 1.744 | [2024-08-07 05:04:04,169][Main][INFO] - [train] Step 1850 out of 25000 | Loss --> 0.632 | Grad_l2 --> 0.526 | Weights_l2 --> 47195.184 | Lr --> 0.000 | Seconds_per_step --> 1.715 | [2024-08-07 05:05:31,337][Main][INFO] - [train] Step 1900 out of 25000 | Loss --> 0.623 | Grad_l2 --> 0.213 | Weights_l2 --> 47195.047 | Lr --> 0.000 | Seconds_per_step --> 1.743 | [2024-08-07 05:06:58,450][Main][INFO] - [train] Step 1950 out of 25000 | Loss --> 0.616 | Grad_l2 --> 0.337 | Weights_l2 --> 47194.910 | Lr --> 0.000 | Seconds_per_step --> 1.742 | [2024-08-07 05:08:24,208][Main][INFO] - [train] Step 2000 out of 25000 | Loss --> 0.632 | Grad_l2 --> 0.220 | Weights_l2 --> 47194.773 | Lr --> 0.000 | Seconds_per_step --> 1.715 | [2024-08-07 05:09:51,402][Main][INFO] - [train] Step 2050 out of 25000 | Loss --> 0.615 | Grad_l2 --> 0.250 | Weights_l2 --> 47194.632 | Lr --> 0.000 | Seconds_per_step --> 1.744 | [2024-08-07 05:11:17,134][Main][INFO] - [train] Step 2100 out of 25000 | Loss --> 0.614 | Grad_l2 --> 0.443 | Weights_l2 --> 47194.495 | Lr --> 0.000 | Seconds_per_step --> 1.715 | [2024-08-07 05:12:44,287][Main][INFO] - [train] Step 2150 out of 25000 | Loss --> 0.623 | Grad_l2 --> 0.269 | Weights_l2 --> 47194.358 | Lr --> 0.000 | Seconds_per_step --> 1.743 | [2024-08-07 05:14:09,998][Main][INFO] - [train] Step 2200 out of 25000 | Loss --> 0.623 | Grad_l2 --> 0.349 | Weights_l2 --> 47194.221 | Lr --> 0.000 | Seconds_per_step --> 1.714 | [2024-08-07 05:15:37,150][Main][INFO] - [train] Step 2250 out of 25000 | Loss --> 0.602 | Grad_l2 --> 0.210 | Weights_l2 --> 47194.084 | Lr --> 0.000 | Seconds_per_step --> 1.743 | [2024-08-07 05:17:02,865][Main][INFO] - [train] Step 2300 out of 25000 | Loss --> 0.593 | Grad_l2 --> 0.206 | Weights_l2 --> 47193.947 | Lr --> 0.000 | Seconds_per_step --> 1.714 | [2024-08-07 05:18:29,986][Main][INFO] - [train] Step 2350 out of 25000 | Loss --> 0.596 | Grad_l2 --> 0.299 | Weights_l2 --> 47193.810 | Lr --> 0.000 | Seconds_per_step --> 1.742 | [2024-08-07 05:19:55,710][Main][INFO] - [train] Step 2400 out of 25000 | Loss --> 0.634 | Grad_l2 --> 0.229 | Weights_l2 --> 47193.669 | Lr --> 0.000 | Seconds_per_step --> 1.714 | [2024-08-07 05:21:22,808][Main][INFO] - [train] Step 2450 out of 25000 | Loss --> 0.634 | Grad_l2 --> 0.187 | Weights_l2 --> 47193.532 | Lr --> 0.000 | Seconds_per_step --> 1.742 | [2024-08-07 05:22:48,543][Main][INFO] - [train] Step 2500 out of 25000 | Loss --> 0.628 | Grad_l2 --> 0.235 | Weights_l2 --> 47193.395 | Lr --> 0.000 | Seconds_per_step --> 1.715 | [2024-08-07 05:24:15,698][Main][INFO] - [train] Step 2550 out of 25000 | Loss --> 0.621 | Grad_l2 --> 0.222 | Weights_l2 --> 47193.258 | Lr --> 0.000 | Seconds_per_step --> 1.743 | [2024-08-07 05:25:41,174][Main][INFO] - [train] Step 2600 out of 25000 | Loss --> 0.598 | Grad_l2 --> 0.200 | Weights_l2 --> 47193.121 | Lr --> 0.000 | Seconds_per_step --> 1.710 | [2024-08-07 05:27:08,312][Main][INFO] - [train] Step 2650 out of 25000 | Loss --> 0.605 | Grad_l2 --> 0.199 | Weights_l2 --> 47192.984 | Lr --> 0.000 | Seconds_per_step --> 1.743 | [2024-08-07 05:28:35,494][Main][INFO] - [train] Step 2700 out of 25000 | Loss --> 0.616 | Grad_l2 --> 0.190 | Weights_l2 --> 47192.847 | Lr --> 0.000 | Seconds_per_step --> 1.744 | [2024-08-07 05:30:01,277][Main][INFO] - [train] Step 2750 out of 25000 | Loss --> 0.638 | Grad_l2 --> 0.208 | Weights_l2 --> 47192.710 | Lr --> 0.000 | Seconds_per_step --> 1.716 | [2024-08-07 05:31:28,184][Main][INFO] - [train] Step 2800 out of 25000 | Loss --> 0.625 | Grad_l2 --> 0.262 | Weights_l2 --> 47192.573 | Lr --> 0.000 | Seconds_per_step --> 1.738 | [2024-08-07 05:32:53,230][Main][INFO] - [train] Step 2850 out of 25000 | Loss --> 0.609 | Grad_l2 --> 0.292 | Weights_l2 --> 47192.432 | Lr --> 0.000 | Seconds_per_step --> 1.701 | [2024-08-07 05:34:19,792][Main][INFO] - [train] Step 2900 out of 25000 | Loss --> 0.597 | Grad_l2 --> 0.184 | Weights_l2 --> 47192.295 | Lr --> 0.000 | Seconds_per_step --> 1.731 | [2024-08-07 05:35:44,824][Main][INFO] - [train] Step 2950 out of 25000 | Loss --> 0.593 | Grad_l2 --> 0.224 | Weights_l2 --> 47192.158 | Lr --> 0.000 | Seconds_per_step --> 1.701 | [2024-08-07 05:37:11,229][Main][INFO] - [train] Step 3000 out of 25000 | Loss --> 0.638 | Grad_l2 --> 0.342 | Weights_l2 --> 47192.021 | Lr --> 0.000 | Seconds_per_step --> 1.728 | [2024-08-07 05:38:36,575][Main][INFO] - [train] Step 3050 out of 25000 | Loss --> 0.599 | Grad_l2 --> 0.171 | Weights_l2 --> 47191.884 | Lr --> 0.000 | Seconds_per_step --> 1.707 | [2024-08-07 05:40:03,729][Main][INFO] - [train] Step 3100 out of 25000 | Loss --> 0.592 | Grad_l2 --> 0.223 | Weights_l2 --> 47191.743 | Lr --> 0.000 | Seconds_per_step --> 1.743 | [2024-08-07 05:41:29,379][Main][INFO] - [train] Step 3150 out of 25000 | Loss --> 0.601 | Grad_l2 --> 0.294 | Weights_l2 --> 47191.610 | Lr --> 0.000 | Seconds_per_step --> 1.713 | [2024-08-07 05:42:56,819][Main][INFO] - [train] Step 3200 out of 25000 | Loss --> 0.575 | Grad_l2 --> 0.207 | Weights_l2 --> 47191.469 | Lr --> 0.000 | Seconds_per_step --> 1.749 | [2024-08-07 05:44:22,604][Main][INFO] - [train] Step 3250 out of 25000 | Loss --> 0.596 | Grad_l2 --> 0.257 | Weights_l2 --> 47191.332 | Lr --> 0.000 | Seconds_per_step --> 1.716 | [2024-08-07 05:45:49,918][Main][INFO] - [train] Step 3300 out of 25000 | Loss --> 0.612 | Grad_l2 --> 0.377 | Weights_l2 --> 47191.199 | Lr --> 0.000 | Seconds_per_step --> 1.746 | [2024-08-07 05:47:15,156][Main][INFO] - [train] Step 3350 out of 25000 | Loss --> 0.612 | Grad_l2 --> 0.250 | Weights_l2 --> 47191.062 | Lr --> 0.000 | Seconds_per_step --> 1.705 | [2024-08-07 05:48:42,282][Main][INFO] - [train] Step 3400 out of 25000 | Loss --> 0.591 | Grad_l2 --> 0.204 | Weights_l2 --> 47190.925 | Lr --> 0.000 | Seconds_per_step --> 1.743 | [2024-08-07 05:50:08,971][Main][INFO] - [train] Step 3450 out of 25000 | Loss --> 0.592 | Grad_l2 --> 0.205 | Weights_l2 --> 47190.784 | Lr --> 0.000 | Seconds_per_step --> 1.734 | [2024-08-07 05:51:34,049][Main][INFO] - [train] Step 3500 out of 25000 | Loss --> 0.604 | Grad_l2 --> 0.239 | Weights_l2 --> 47190.647 | Lr --> 0.000 | Seconds_per_step --> 1.702 | [2024-08-07 05:53:01,117][Main][INFO] - [train] Step 3550 out of 25000 | Loss --> 0.563 | Grad_l2 --> 0.270 | Weights_l2 --> 47190.510 | Lr --> 0.000 | Seconds_per_step --> 1.741 | [2024-08-07 05:54:26,716][Main][INFO] - [train] Step 3600 out of 25000 | Loss --> 0.585 | Grad_l2 --> 0.448 | Weights_l2 --> 47190.373 | Lr --> 0.000 | Seconds_per_step --> 1.712 | [2024-08-07 05:55:53,130][Main][INFO] - [train] Step 3650 out of 25000 | Loss --> 0.596 | Grad_l2 --> 0.234 | Weights_l2 --> 47190.236 | Lr --> 0.000 | Seconds_per_step --> 1.728 | [2024-08-07 05:57:18,221][Main][INFO] - [train] Step 3700 out of 25000 | Loss --> 0.616 | Grad_l2 --> 0.206 | Weights_l2 --> 47190.099 | Lr --> 0.000 | Seconds_per_step --> 1.702 | [2024-08-07 05:58:45,613][Main][INFO] - [train] Step 3750 out of 25000 | Loss --> 0.562 | Grad_l2 --> 0.159 | Weights_l2 --> 47189.962 | Lr --> 0.000 | Seconds_per_step --> 1.748 | [2024-08-07 06:00:11,446][Main][INFO] - [train] Step 3800 out of 25000 | Loss --> 0.588 | Grad_l2 --> 0.199 | Weights_l2 --> 47189.821 | Lr --> 0.000 | Seconds_per_step --> 1.717 | [2024-08-07 06:01:38,826][Main][INFO] - [train] Step 3850 out of 25000 | Loss --> 0.581 | Grad_l2 --> 0.248 | Weights_l2 --> 47189.684 | Lr --> 0.000 | Seconds_per_step --> 1.748 | [2024-08-07 06:03:05,865][Main][INFO] - [train] Step 3900 out of 25000 | Loss --> 0.597 | Grad_l2 --> 0.195 | Weights_l2 --> 47189.547 | Lr --> 0.000 | Seconds_per_step --> 1.741 | [2024-08-07 06:04:31,625][Main][INFO] - [train] Step 3950 out of 25000 | Loss --> 0.595 | Grad_l2 --> 0.192 | Weights_l2 --> 47189.410 | Lr --> 0.000 | Seconds_per_step --> 1.715 | [2024-08-07 06:05:59,020][Main][INFO] - [train] Step 4000 out of 25000 | Loss --> 0.584 | Grad_l2 --> 0.200 | Weights_l2 --> 47189.273 | Lr --> 0.000 | Seconds_per_step --> 1.748 | [2024-08-07 06:09:10,498][Main][INFO] - [eval] Step 4000 out of 25000 | Loss --> 0.941 | Accuracy --> 0.832 | Time --> 191.474 | [2024-08-07 06:13:48,778][absl][INFO] - Using default tokenizer. [2024-08-07 06:13:49,269][Main][INFO] - [test] Step 4000 out of 25000 | Rougel --> 15.204 | Time --> 278.771 | [2024-08-07 06:15:14,427][Main][INFO] - [train] Step 4050 out of 25000 | Loss --> 0.564 | Grad_l2 --> 0.288 | Weights_l2 --> 47189.135 | Lr --> 0.000 | Seconds_per_step --> 1.703 | [2024-08-07 06:16:39,500][Main][INFO] - [train] Step 4100 out of 25000 | Loss --> 0.567 | Grad_l2 --> 0.205 | Weights_l2 --> 47188.998 | Lr --> 0.000 | Seconds_per_step --> 1.701 | [2024-08-07 06:18:07,374][Main][INFO] - [train] Step 4150 out of 25000 | Loss --> 0.583 | Grad_l2 --> 0.203 | Weights_l2 --> 47188.861 | Lr --> 0.000 | Seconds_per_step --> 1.757 | [2024-08-07 06:19:32,587][Main][INFO] - [train] Step 4200 out of 25000 | Loss --> 0.596 | Grad_l2 --> 0.269 | Weights_l2 --> 47188.724 | Lr --> 0.000 | Seconds_per_step --> 1.704 | [2024-08-07 06:20:58,195][Main][INFO] - [train] Step 4250 out of 25000 | Loss --> 0.575 | Grad_l2 --> 0.279 | Weights_l2 --> 47188.587 | Lr --> 0.000 | Seconds_per_step --> 1.712 | [2024-08-07 06:22:26,587][Main][INFO] - [train] Step 4300 out of 25000 | Loss --> 0.595 | Grad_l2 --> 0.256 | Weights_l2 --> 47188.446 | Lr --> 0.000 | Seconds_per_step --> 1.768 | [2024-08-07 06:23:52,408][Main][INFO] - [train] Step 4350 out of 25000 | Loss --> 0.567 | Grad_l2 --> 0.185 | Weights_l2 --> 47188.309 | Lr --> 0.000 | Seconds_per_step --> 1.716 | [2024-08-07 06:25:18,175][Main][INFO] - [train] Step 4400 out of 25000 | Loss --> 0.587 | Grad_l2 --> 0.188 | Weights_l2 --> 47188.172 | Lr --> 0.000 | Seconds_per_step --> 1.715 | [2024-08-07 06:26:46,749][Main][INFO] - [train] Step 4450 out of 25000 | Loss --> 0.580 | Grad_l2 --> 0.164 | Weights_l2 --> 47188.035 | Lr --> 0.000 | Seconds_per_step --> 1.771 | [2024-08-07 06:28:12,711][Main][INFO] - [train] Step 4500 out of 25000 | Loss --> 0.588 | Grad_l2 --> 0.221 | Weights_l2 --> 47187.898 | Lr --> 0.000 | Seconds_per_step --> 1.719 | [2024-08-07 06:29:38,541][Main][INFO] - [train] Step 4550 out of 25000 | Loss --> 0.586 | Grad_l2 --> 0.191 | Weights_l2 --> 47187.761 | Lr --> 0.000 | Seconds_per_step --> 1.717 | [2024-08-07 06:31:06,867][Main][INFO] - [train] Step 4600 out of 25000 | Loss --> 0.574 | Grad_l2 --> 0.215 | Weights_l2 --> 47187.628 | Lr --> 0.000 | Seconds_per_step --> 1.767 | [2024-08-07 06:32:32,538][Main][INFO] - [train] Step 4650 out of 25000 | Loss --> 0.554 | Grad_l2 --> 0.220 | Weights_l2 --> 47187.487 | Lr --> 0.000 | Seconds_per_step --> 1.713 | [2024-08-07 06:33:58,044][Main][INFO] - [train] Step 4700 out of 25000 | Loss --> 0.571 | Grad_l2 --> 0.200 | Weights_l2 --> 47187.346 | Lr --> 0.000 | Seconds_per_step --> 1.710 | [2024-08-07 06:35:26,162][Main][INFO] - [train] Step 4750 out of 25000 | Loss --> 0.562 | Grad_l2 --> 0.273 | Weights_l2 --> 47187.209 | Lr --> 0.000 | Seconds_per_step --> 1.762 | [2024-08-07 06:36:51,666][Main][INFO] - [train] Step 4800 out of 25000 | Loss --> 0.560 | Grad_l2 --> 0.188 | Weights_l2 --> 47187.071 | Lr --> 0.000 | Seconds_per_step --> 1.710 | [2024-08-07 06:38:17,107][Main][INFO] - [train] Step 4850 out of 25000 | Loss --> 0.569 | Grad_l2 --> 0.293 | Weights_l2 --> 47186.938 | Lr --> 0.000 | Seconds_per_step --> 1.709 | [2024-08-07 06:39:44,965][Main][INFO] - [train] Step 4900 out of 25000 | Loss --> 0.566 | Grad_l2 --> 0.177 | Weights_l2 --> 47186.797 | Lr --> 0.000 | Seconds_per_step --> 1.757 | [2024-08-07 06:41:10,239][Main][INFO] - [train] Step 4950 out of 25000 | Loss --> 0.547 | Grad_l2 --> 0.170 | Weights_l2 --> 47186.660 | Lr --> 0.000 | Seconds_per_step --> 1.705 | [2024-08-07 06:42:35,764][Main][INFO] - [train] Step 5000 out of 25000 | Loss --> 0.585 | Grad_l2 --> 0.219 | Weights_l2 --> 47186.523 | Lr --> 0.000 | Seconds_per_step --> 1.710 | [2024-08-07 06:42:35,765][accelerate.accelerator][INFO] - Saving current state to checkpoint-ft-5000 [2024-08-07 06:42:35,771][accelerate.utils.other][WARNING] - Removed shared tensor {'encoder.embed_tokens.weight', 'lm_head.weight', 'decoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading [2024-08-07 06:42:36,580][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-ft-5000/model.safetensors [2024-08-07 06:42:37,727][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-ft-5000/optimizer.bin [2024-08-07 06:42:37,728][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-ft-5000/scheduler.bin [2024-08-07 06:42:37,728][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-ft-5000/sampler.bin [2024-08-07 06:42:37,728][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-ft-5000/sampler_1.bin [2024-08-07 06:42:37,729][accelerate.checkpointing][INFO] - Random states saved in checkpoint-ft-5000/random_states_0.pkl [2024-08-07 06:44:06,450][Main][INFO] - [train] Step 5050 out of 25000 | Loss --> 0.564 | Grad_l2 --> 0.230 | Weights_l2 --> 47186.386 | Lr --> 0.000 | Seconds_per_step --> 1.814 | [2024-08-07 06:45:32,495][Main][INFO] - [train] Step 5100 out of 25000 | Loss --> 0.565 | Grad_l2 --> 0.191 | Weights_l2 --> 47186.248 | Lr --> 0.000 | Seconds_per_step --> 1.721 | [2024-08-07 06:46:58,589][Main][INFO] - [train] Step 5150 out of 25000 | Loss --> 0.583 | Grad_l2 --> 0.170 | Weights_l2 --> 47186.111 | Lr --> 0.000 | Seconds_per_step --> 1.722 | [2024-08-07 06:48:27,030][Main][INFO] - [train] Step 5200 out of 25000 | Loss --> 0.569 | Grad_l2 --> 0.164 | Weights_l2 --> 47185.974 | Lr --> 0.000 | Seconds_per_step --> 1.769 | [2024-08-07 06:49:53,124][Main][INFO] - [train] Step 5250 out of 25000 | Loss --> 0.551 | Grad_l2 --> 0.174 | Weights_l2 --> 47185.837 | Lr --> 0.000 | Seconds_per_step --> 1.722 | [2024-08-07 06:51:19,188][Main][INFO] - [train] Step 5300 out of 25000 | Loss --> 0.551 | Grad_l2 --> 0.179 | Weights_l2 --> 47185.700 | Lr --> 0.000 | Seconds_per_step --> 1.721 | [2024-08-07 06:52:47,972][Main][INFO] - [train] Step 5350 out of 25000 | Loss --> 0.553 | Grad_l2 --> 0.183 | Weights_l2 --> 47185.558 | Lr --> 0.000 | Seconds_per_step --> 1.776 | [2024-08-07 06:54:13,671][Main][INFO] - [train] Step 5400 out of 25000 | Loss --> 0.565 | Grad_l2 --> 0.206 | Weights_l2 --> 47185.421 | Lr --> 0.000 | Seconds_per_step --> 1.714 | [2024-08-07 06:55:41,483][Main][INFO] - [train] Step 5450 out of 25000 | Loss --> 0.593 | Grad_l2 --> 0.205 | Weights_l2 --> 47185.288 | Lr --> 0.000 | Seconds_per_step --> 1.756 | [2024-08-07 06:57:06,833][Main][INFO] - [train] Step 5500 out of 25000 | Loss --> 0.533 | Grad_l2 --> 0.157 | Weights_l2 --> 47185.151 | Lr --> 0.000 | Seconds_per_step --> 1.707 | [2024-08-07 06:58:32,236][Main][INFO] - [train] Step 5550 out of 25000 | Loss --> 0.595 | Grad_l2 --> 0.209 | Weights_l2 --> 47185.009 | Lr --> 0.000 | Seconds_per_step --> 1.708 | [2024-08-07 07:00:00,311][Main][INFO] - [train] Step 5600 out of 25000 | Loss --> 0.558 | Grad_l2 --> 0.202 | Weights_l2 --> 47184.872 | Lr --> 0.000 | Seconds_per_step --> 1.761 | [2024-08-07 07:01:26,167][Main][INFO] - [train] Step 5650 out of 25000 | Loss --> 0.590 | Grad_l2 --> 0.157 | Weights_l2 --> 47184.735 | Lr --> 0.000 | Seconds_per_step --> 1.717 | [2024-08-07 07:02:51,945][Main][INFO] - [train] Step 5700 out of 25000 | Loss --> 0.559 | Grad_l2 --> 0.160 | Weights_l2 --> 47184.598 | Lr --> 0.000 | Seconds_per_step --> 1.716 | [2024-08-07 07:04:20,503][Main][INFO] - [train] Step 5750 out of 25000 | Loss --> 0.556 | Grad_l2 --> 0.169 | Weights_l2 --> 47184.460 | Lr --> 0.000 | Seconds_per_step --> 1.771 | [2024-08-07 07:05:45,999][Main][INFO] - [train] Step 5800 out of 25000 | Loss --> 0.548 | Grad_l2 --> 0.176 | Weights_l2 --> 47184.323 | Lr --> 0.000 | Seconds_per_step --> 1.710 | [2024-08-07 07:07:11,484][Main][INFO] - [train] Step 5850 out of 25000 | Loss --> 0.561 | Grad_l2 --> 0.161 | Weights_l2 --> 47184.186 | Lr --> 0.000 | Seconds_per_step --> 1.710 | [2024-08-07 07:08:39,677][Main][INFO] - [train] Step 5900 out of 25000 | Loss --> 0.575 | Grad_l2 --> 0.157 | Weights_l2 --> 47184.049 | Lr --> 0.000 | Seconds_per_step --> 1.764 | [2024-08-07 07:10:05,179][Main][INFO] - [train] Step 5950 out of 25000 | Loss --> 0.569 | Grad_l2 --> 0.147 | Weights_l2 --> 47183.911 | Lr --> 0.000 | Seconds_per_step --> 1.710 | [2024-08-07 07:11:30,734][Main][INFO] - [train] Step 6000 out of 25000 | Loss --> 0.565 | Grad_l2 --> 0.202 | Weights_l2 --> 47183.774 | Lr --> 0.000 | Seconds_per_step --> 1.711 | [2024-08-07 07:12:59,400][Main][INFO] - [train] Step 6050 out of 25000 | Loss --> 0.565 | Grad_l2 --> 0.148 | Weights_l2 --> 47183.633 | Lr --> 0.000 | Seconds_per_step --> 1.773 | [2024-08-07 07:14:25,522][Main][INFO] - [train] Step 6100 out of 25000 | Loss --> 0.568 | Grad_l2 --> 0.178 | Weights_l2 --> 47183.496 | Lr --> 0.000 | Seconds_per_step --> 1.722 | [2024-08-07 07:15:51,608][Main][INFO] - [train] Step 6150 out of 25000 | Loss --> 0.561 | Grad_l2 --> 0.176 | Weights_l2 --> 47183.362 | Lr --> 0.000 | Seconds_per_step --> 1.722 | [2024-08-07 07:17:20,427][Main][INFO] - [train] Step 6200 out of 25000 | Loss --> 0.561 | Grad_l2 --> 0.166 | Weights_l2 --> 47183.225 | Lr --> 0.000 | Seconds_per_step --> 1.776 | [2024-08-07 07:18:46,385][Main][INFO] - [train] Step 6250 out of 25000 | Loss --> 0.556 | Grad_l2 --> 0.181 | Weights_l2 --> 47183.084 | Lr --> 0.000 | Seconds_per_step --> 1.719 | [2024-08-07 07:20:11,716][Main][INFO] - [train] Step 6300 out of 25000 | Loss --> 0.576 | Grad_l2 --> 0.153 | Weights_l2 --> 47182.951 | Lr --> 0.000 | Seconds_per_step --> 1.707 | [2024-08-07 07:21:39,494][Main][INFO] - [train] Step 6350 out of 25000 | Loss --> 0.562 | Grad_l2 --> 0.193 | Weights_l2 --> 47182.813 | Lr --> 0.000 | Seconds_per_step --> 1.756 | [2024-08-07 07:23:04,833][Main][INFO] - [train] Step 6400 out of 25000 | Loss --> 0.551 | Grad_l2 --> 0.158 | Weights_l2 --> 47182.676 | Lr --> 0.000 | Seconds_per_step --> 1.707 | [2024-08-07 07:24:30,152][Main][INFO] - [train] Step 6450 out of 25000 | Loss --> 0.544 | Grad_l2 --> 0.160 | Weights_l2 --> 47182.539 | Lr --> 0.000 | Seconds_per_step --> 1.706 | [2024-08-07 07:25:57,971][Main][INFO] - [train] Step 6500 out of 25000 | Loss --> 0.573 | Grad_l2 --> 0.179 | Weights_l2 --> 47182.398 | Lr --> 0.000 | Seconds_per_step --> 1.756 | [2024-08-07 07:27:23,312][Main][INFO] - [train] Step 6550 out of 25000 | Loss --> 0.557 | Grad_l2 --> 0.217 | Weights_l2 --> 47182.260 | Lr --> 0.000 | Seconds_per_step --> 1.707 | [2024-08-07 07:28:48,618][Main][INFO] - [train] Step 6600 out of 25000 | Loss --> 0.526 | Grad_l2 --> 0.152 | Weights_l2 --> 47182.127 | Lr --> 0.000 | Seconds_per_step --> 1.706 | [2024-08-07 07:30:16,544][Main][INFO] - [train] Step 6650 out of 25000 | Loss --> 0.547 | Grad_l2 --> 0.150 | Weights_l2 --> 47181.986 | Lr --> 0.000 | Seconds_per_step --> 1.759 | [2024-08-07 07:31:42,081][Main][INFO] - [train] Step 6700 out of 25000 | Loss --> 0.547 | Grad_l2 --> 0.186 | Weights_l2 --> 47181.848 | Lr --> 0.000 | Seconds_per_step --> 1.711 | [2024-08-07 07:33:07,474][Main][INFO] - [train] Step 6750 out of 25000 | Loss --> 0.582 | Grad_l2 --> 0.156 | Weights_l2 --> 47181.711 | Lr --> 0.000 | Seconds_per_step --> 1.708 | [2024-08-07 07:34:36,072][Main][INFO] - [train] Step 6800 out of 25000 | Loss --> 0.541 | Grad_l2 --> 0.188 | Weights_l2 --> 47181.574 | Lr --> 0.000 | Seconds_per_step --> 1.772 | [2024-08-07 07:36:01,451][Main][INFO] - [train] Step 6850 out of 25000 | Loss --> 0.558 | Grad_l2 --> 0.205 | Weights_l2 --> 47181.437 | Lr --> 0.000 | Seconds_per_step --> 1.708 | [2024-08-07 07:37:26,901][Main][INFO] - [train] Step 6900 out of 25000 | Loss --> 0.565 | Grad_l2 --> 0.152 | Weights_l2 --> 47181.299 | Lr --> 0.000 | Seconds_per_step --> 1.709 | [2024-08-07 07:38:55,145][Main][INFO] - [train] Step 6950 out of 25000 | Loss --> 0.587 | Grad_l2 --> 0.193 | Weights_l2 --> 47181.162 | Lr --> 0.000 | Seconds_per_step --> 1.765 | [2024-08-07 07:40:21,155][Main][INFO] - [train] Step 7000 out of 25000 | Loss --> 0.539 | Grad_l2 --> 0.172 | Weights_l2 --> 47181.025 | Lr --> 0.000 | Seconds_per_step --> 1.720 | [2024-08-07 07:41:48,993][Main][INFO] - [train] Step 7050 out of 25000 | Loss --> 0.551 | Grad_l2 --> 0.168 | Weights_l2 --> 47180.887 | Lr --> 0.000 | Seconds_per_step --> 1.757 | [2024-08-07 07:43:14,381][Main][INFO] - [train] Step 7100 out of 25000 | Loss --> 0.550 | Grad_l2 --> 0.143 | Weights_l2 --> 47180.750 | Lr --> 0.000 | Seconds_per_step --> 1.708 | [2024-08-07 07:44:40,205][Main][INFO] - [train] Step 7150 out of 25000 | Loss --> 0.553 | Grad_l2 --> 0.148 | Weights_l2 --> 47180.609 | Lr --> 0.000 | Seconds_per_step --> 1.716 | [2024-08-07 07:46:09,019][Main][INFO] - [train] Step 7200 out of 25000 | Loss --> 0.573 | Grad_l2 --> 0.179 | Weights_l2 --> 47180.472 | Lr --> 0.000 | Seconds_per_step --> 1.776 | [2024-08-07 07:47:35,121][Main][INFO] - [train] Step 7250 out of 25000 | Loss --> 0.558 | Grad_l2 --> 0.170 | Weights_l2 --> 47180.338 | Lr --> 0.000 | Seconds_per_step --> 1.722 | [2024-08-07 07:49:00,910][Main][INFO] - [train] Step 7300 out of 25000 | Loss --> 0.539 | Grad_l2 --> 0.161 | Weights_l2 --> 47180.201 | Lr --> 0.000 | Seconds_per_step --> 1.716 | [2024-08-07 07:50:29,152][Main][INFO] - [train] Step 7350 out of 25000 | Loss --> 0.575 | Grad_l2 --> 0.159 | Weights_l2 --> 47180.064 | Lr --> 0.000 | Seconds_per_step --> 1.765 | [2024-08-07 07:51:54,570][Main][INFO] - [train] Step 7400 out of 25000 | Loss --> 0.535 | Grad_l2 --> 0.143 | Weights_l2 --> 47179.926 | Lr --> 0.000 | Seconds_per_step --> 1.708 | [2024-08-07 07:53:19,946][Main][INFO] - [train] Step 7450 out of 25000 | Loss --> 0.568 | Grad_l2 --> 0.153 | Weights_l2 --> 47179.789 | Lr --> 0.000 | Seconds_per_step --> 1.708 | [2024-08-07 07:54:47,864][Main][INFO] - [train] Step 7500 out of 25000 | Loss --> 0.554 | Grad_l2 --> 0.161 | Weights_l2 --> 47179.652 | Lr --> 0.000 | Seconds_per_step --> 1.758 | [2024-08-07 07:56:13,855][Main][INFO] - [train] Step 7550 out of 25000 | Loss --> 0.526 | Grad_l2 --> 0.140 | Weights_l2 --> 47179.511 | Lr --> 0.000 | Seconds_per_step --> 1.720 | [2024-08-07 07:57:39,992][Main][INFO] - [train] Step 7600 out of 25000 | Loss --> 0.552 | Grad_l2 --> 0.201 | Weights_l2 --> 47179.377 | Lr --> 0.000 | Seconds_per_step --> 1.723 | [2024-08-07 07:59:08,451][Main][INFO] - [train] Step 7650 out of 25000 | Loss --> 0.536 | Grad_l2 --> 0.179 | Weights_l2 --> 47179.236 | Lr --> 0.000 | Seconds_per_step --> 1.769 | [2024-08-07 08:00:34,328][Main][INFO] - [train] Step 7700 out of 25000 | Loss --> 0.554 | Grad_l2 --> 0.158 | Weights_l2 --> 47179.103 | Lr --> 0.000 | Seconds_per_step --> 1.718 | [2024-08-07 08:01:59,720][Main][INFO] - [train] Step 7750 out of 25000 | Loss --> 0.579 | Grad_l2 --> 0.156 | Weights_l2 --> 47178.965 | Lr --> 0.000 | Seconds_per_step --> 1.708 | [2024-08-07 08:03:27,727][Main][INFO] - [train] Step 7800 out of 25000 | Loss --> 0.565 | Grad_l2 --> 0.143 | Weights_l2 --> 47178.828 | Lr --> 0.000 | Seconds_per_step --> 1.760 | [2024-08-07 08:04:53,570][Main][INFO] - [train] Step 7850 out of 25000 | Loss --> 0.524 | Grad_l2 --> 0.151 | Weights_l2 --> 47178.687 | Lr --> 0.000 | Seconds_per_step --> 1.717 | [2024-08-07 08:06:19,787][Main][INFO] - [train] Step 7900 out of 25000 | Loss --> 0.566 | Grad_l2 --> 0.145 | Weights_l2 --> 47178.549 | Lr --> 0.000 | Seconds_per_step --> 1.724 | [2024-08-07 08:07:48,354][Main][INFO] - [train] Step 7950 out of 25000 | Loss --> 0.554 | Grad_l2 --> 0.212 | Weights_l2 --> 47178.412 | Lr --> 0.000 | Seconds_per_step --> 1.771 | [2024-08-07 08:09:13,971][Main][INFO] - [train] Step 8000 out of 25000 | Loss --> 0.554 | Grad_l2 --> 0.167 | Weights_l2 --> 47178.275 | Lr --> 0.000 | Seconds_per_step --> 1.712 | [2024-08-07 08:09:18,908][Main][INFO] - [eval] Step 8000 out of 25000 | Loss --> 0.880 | Accuracy --> 0.838 | Time --> 4.933 | [2024-08-07 08:13:58,225][absl][INFO] - Using default tokenizer. [2024-08-07 08:13:58,808][Main][INFO] - [test] Step 8000 out of 25000 | Rougel --> 21.696 | Time --> 279.900 | [2024-08-07 08:15:24,876][Main][INFO] - [train] Step 8050 out of 25000 | Loss --> 0.539 | Grad_l2 --> 0.208 | Weights_l2 --> 47178.138 | Lr --> 0.000 | Seconds_per_step --> 1.721 | [2024-08-07 08:16:53,543][Main][INFO] - [train] Step 8100 out of 25000 | Loss --> 0.556 | Grad_l2 --> 0.159 | Weights_l2 --> 47178.000 | Lr --> 0.000 | Seconds_per_step --> 1.773 | [2024-08-07 08:18:19,674][Main][INFO] - [train] Step 8150 out of 25000 | Loss --> 0.540 | Grad_l2 --> 0.157 | Weights_l2 --> 47177.867 | Lr --> 0.000 | Seconds_per_step --> 1.723 | [2024-08-07 08:19:45,669][Main][INFO] - [train] Step 8200 out of 25000 | Loss --> 0.566 | Grad_l2 --> 0.150 | Weights_l2 --> 47177.729 | Lr --> 0.000 | Seconds_per_step --> 1.720 | [2024-08-07 08:21:13,985][Main][INFO] - [train] Step 8250 out of 25000 | Loss --> 0.546 | Grad_l2 --> 0.141 | Weights_l2 --> 47177.588 | Lr --> 0.000 | Seconds_per_step --> 1.766 | [2024-08-07 08:22:39,528][Main][INFO] - [train] Step 8300 out of 25000 | Loss --> 0.554 | Grad_l2 --> 0.158 | Weights_l2 --> 47177.451 | Lr --> 0.000 | Seconds_per_step --> 1.711 | [2024-08-07 08:24:05,082][Main][INFO] - [train] Step 8350 out of 25000 | Loss --> 0.526 | Grad_l2 --> 0.147 | Weights_l2 --> 47177.314 | Lr --> 0.000 | Seconds_per_step --> 1.711 | [2024-08-07 08:25:33,450][Main][INFO] - [train] Step 8400 out of 25000 | Loss --> 0.539 | Grad_l2 --> 0.141 | Weights_l2 --> 47177.176 | Lr --> 0.000 | Seconds_per_step --> 1.767 | [2024-08-07 08:26:59,375][Main][INFO] - [train] Step 8450 out of 25000 | Loss --> 0.526 | Grad_l2 --> 0.134 | Weights_l2 --> 47177.039 | Lr --> 0.000 | Seconds_per_step --> 1.718 | [2024-08-07 08:28:25,341][Main][INFO] - [train] Step 8500 out of 25000 | Loss --> 0.528 | Grad_l2 --> 0.171 | Weights_l2 --> 47176.902 | Lr --> 0.000 | Seconds_per_step --> 1.719 | [2024-08-07 08:29:53,919][Main][INFO] - [train] Step 8550 out of 25000 | Loss --> 0.546 | Grad_l2 --> 0.157 | Weights_l2 --> 47176.764 | Lr --> 0.000 | Seconds_per_step --> 1.772 | [2024-08-07 08:31:19,464][Main][INFO] - [train] Step 8600 out of 25000 | Loss --> 0.553 | Grad_l2 --> 0.146 | Weights_l2 --> 47176.631 | Lr --> 0.000 | Seconds_per_step --> 1.711 | [2024-08-07 08:32:47,795][Main][INFO] - [train] Step 8650 out of 25000 | Loss --> 0.532 | Grad_l2 --> 0.140 | Weights_l2 --> 47176.494 | Lr --> 0.000 | Seconds_per_step --> 1.767 | [2024-08-07 08:34:13,984][Main][INFO] - [train] Step 8700 out of 25000 | Loss --> 0.553 | Grad_l2 --> 0.144 | Weights_l2 --> 47176.352 | Lr --> 0.000 | Seconds_per_step --> 1.724 | [2024-08-07 08:35:40,153][Main][INFO] - [train] Step 8750 out of 25000 | Loss --> 0.537 | Grad_l2 --> 0.135 | Weights_l2 --> 47176.215 | Lr --> 0.000 | Seconds_per_step --> 1.723 | [2024-08-07 08:37:08,880][Main][INFO] - [train] Step 8800 out of 25000 | Loss --> 0.552 | Grad_l2 --> 0.147 | Weights_l2 --> 47176.078 | Lr --> 0.000 | Seconds_per_step --> 1.775 | [2024-08-07 08:38:35,095][Main][INFO] - [train] Step 8850 out of 25000 | Loss --> 0.564 | Grad_l2 --> 0.156 | Weights_l2 --> 47175.940 | Lr --> 0.000 | Seconds_per_step --> 1.724 | [2024-08-07 08:40:01,329][Main][INFO] - [train] Step 8900 out of 25000 | Loss --> 0.525 | Grad_l2 --> 0.144 | Weights_l2 --> 47175.803 | Lr --> 0.000 | Seconds_per_step --> 1.725 | [2024-08-07 08:41:29,387][Main][INFO] - [train] Step 8950 out of 25000 | Loss --> 0.544 | Grad_l2 --> 0.150 | Weights_l2 --> 47175.666 | Lr --> 0.000 | Seconds_per_step --> 1.761 | [2024-08-07 08:42:55,260][Main][INFO] - [train] Step 9000 out of 25000 | Loss --> 0.518 | Grad_l2 --> 0.151 | Weights_l2 --> 47175.528 | Lr --> 0.000 | Seconds_per_step --> 1.717 | [2024-08-07 08:44:21,359][Main][INFO] - [train] Step 9050 out of 25000 | Loss --> 0.562 | Grad_l2 --> 0.176 | Weights_l2 --> 47175.395 | Lr --> 0.000 | Seconds_per_step --> 1.722 | [2024-08-07 08:45:50,226][Main][INFO] - [train] Step 9100 out of 25000 | Loss --> 0.532 | Grad_l2 --> 0.147 | Weights_l2 --> 47175.258 | Lr --> 0.000 | Seconds_per_step --> 1.777 | [2024-08-07 08:47:16,084][Main][INFO] - [train] Step 9150 out of 25000 | Loss --> 0.543 | Grad_l2 --> 0.174 | Weights_l2 --> 47175.117 | Lr --> 0.000 | Seconds_per_step --> 1.717 | [2024-08-07 08:48:41,636][Main][INFO] - [train] Step 9200 out of 25000 | Loss --> 0.546 | Grad_l2 --> 0.173 | Weights_l2 --> 47174.979 | Lr --> 0.000 | Seconds_per_step --> 1.711 | [2024-08-07 08:50:10,103][Main][INFO] - [train] Step 9250 out of 25000 | Loss --> 0.557 | Grad_l2 --> 0.152 | Weights_l2 --> 47174.838 | Lr --> 0.000 | Seconds_per_step --> 1.769 | [2024-08-07 08:51:36,133][Main][INFO] - [train] Step 9300 out of 25000 | Loss --> 0.536 | Grad_l2 --> 0.155 | Weights_l2 --> 47174.701 | Lr --> 0.000 | Seconds_per_step --> 1.721 | [2024-08-07 08:53:02,290][Main][INFO] - [train] Step 9350 out of 25000 | Loss --> 0.552 | Grad_l2 --> 0.177 | Weights_l2 --> 47174.567 | Lr --> 0.000 | Seconds_per_step --> 1.723 | [2024-08-07 08:54:30,686][Main][INFO] - [train] Step 9400 out of 25000 | Loss --> 0.558 | Grad_l2 --> 0.147 | Weights_l2 --> 47174.430 | Lr --> 0.000 | Seconds_per_step --> 1.768 | [2024-08-07 08:55:56,533][Main][INFO] - [train] Step 9450 out of 25000 | Loss --> 0.540 | Grad_l2 --> 0.151 | Weights_l2 --> 47174.293 | Lr --> 0.000 | Seconds_per_step --> 1.717 | [2024-08-07 08:57:22,391][Main][INFO] - [train] Step 9500 out of 25000 | Loss --> 0.507 | Grad_l2 --> 0.169 | Weights_l2 --> 47174.155 | Lr --> 0.000 | Seconds_per_step --> 1.717 | [2024-08-07 08:58:50,627][Main][INFO] - [train] Step 9550 out of 25000 | Loss --> 0.534 | Grad_l2 --> 0.142 | Weights_l2 --> 47174.018 | Lr --> 0.000 | Seconds_per_step --> 1.765 | [2024-08-07 09:00:16,767][Main][INFO] - [train] Step 9600 out of 25000 | Loss --> 0.519 | Grad_l2 --> 0.218 | Weights_l2 --> 47173.881 | Lr --> 0.000 | Seconds_per_step --> 1.723 | [2024-08-07 09:01:48,187][Main][INFO] - [train] Step 9650 out of 25000 | Loss --> 0.542 | Grad_l2 --> 0.153 | Weights_l2 --> 47173.743 | Lr --> 0.000 | Seconds_per_step --> 1.828 | [2024-08-07 09:03:13,777][Main][INFO] - [train] Step 9700 out of 25000 | Loss --> 0.536 | Grad_l2 --> 0.238 | Weights_l2 --> 47173.606 | Lr --> 0.000 | Seconds_per_step --> 1.712 | [2024-08-07 09:04:39,235][Main][INFO] - [train] Step 9750 out of 25000 | Loss --> 0.546 | Grad_l2 --> 0.151 | Weights_l2 --> 47173.469 | Lr --> 0.000 | Seconds_per_step --> 1.709 | [2024-08-07 09:06:07,645][Main][INFO] - [train] Step 9800 out of 25000 | Loss --> 0.550 | Grad_l2 --> 0.154 | Weights_l2 --> 47173.332 | Lr --> 0.000 | Seconds_per_step --> 1.768 | [2024-08-07 09:07:33,777][Main][INFO] - [train] Step 9850 out of 25000 | Loss --> 0.538 | Grad_l2 --> 0.146 | Weights_l2 --> 47173.194 | Lr --> 0.000 | Seconds_per_step --> 1.723 | [2024-08-07 09:08:59,343][Main][INFO] - [train] Step 9900 out of 25000 | Loss --> 0.561 | Grad_l2 --> 0.156 | Weights_l2 --> 47173.057 | Lr --> 0.000 | Seconds_per_step --> 1.711 | [2024-08-07 09:10:27,785][Main][INFO] - [train] Step 9950 out of 25000 | Loss --> 0.543 | Grad_l2 --> 0.143 | Weights_l2 --> 47172.920 | Lr --> 0.000 | Seconds_per_step --> 1.769 | [2024-08-07 09:11:53,857][Main][INFO] - [train] Step 10000 out of 25000 | Loss --> 0.547 | Grad_l2 --> 0.139 | Weights_l2 --> 47172.782 | Lr --> 0.000 | Seconds_per_step --> 1.721 | [2024-08-07 09:11:53,857][accelerate.accelerator][INFO] - Saving current state to checkpoint-ft-10000 [2024-08-07 09:11:53,863][accelerate.utils.other][WARNING] - Removed shared tensor {'encoder.embed_tokens.weight', 'lm_head.weight', 'decoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading [2024-08-07 09:11:54,664][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-ft-10000/model.safetensors [2024-08-07 09:11:55,798][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-ft-10000/optimizer.bin [2024-08-07 09:11:55,798][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-ft-10000/scheduler.bin [2024-08-07 09:11:55,799][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-ft-10000/sampler.bin [2024-08-07 09:11:55,799][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-ft-10000/sampler_1.bin [2024-08-07 09:11:55,800][accelerate.checkpointing][INFO] - Random states saved in checkpoint-ft-10000/random_states_0.pkl [2024-08-07 09:13:22,152][Main][INFO] - [train] Step 10050 out of 25000 | Loss --> 0.530 | Grad_l2 --> 0.143 | Weights_l2 --> 47172.645 | Lr --> 0.000 | Seconds_per_step --> 1.766 | [2024-08-07 09:14:50,831][Main][INFO] - [train] Step 10100 out of 25000 | Loss --> 0.532 | Grad_l2 --> 0.517 | Weights_l2 --> 47172.508 | Lr --> 0.000 | Seconds_per_step --> 1.774 | [2024-08-07 09:16:16,529][Main][INFO] - [train] Step 10150 out of 25000 | Loss --> 0.577 | Grad_l2 --> 0.154 | Weights_l2 --> 47172.370 | Lr --> 0.000 | Seconds_per_step --> 1.714 | [2024-08-07 09:17:42,010][Main][INFO] - [train] Step 10200 out of 25000 | Loss --> 0.558 | Grad_l2 --> 0.145 | Weights_l2 --> 47172.233 | Lr --> 0.000 | Seconds_per_step --> 1.710 | [2024-08-07 09:19:10,193][Main][INFO] - [train] Step 10250 out of 25000 | Loss --> 0.556 | Grad_l2 --> 0.148 | Weights_l2 --> 47172.096 | Lr --> 0.000 | Seconds_per_step --> 1.764 | [2024-08-07 09:20:35,887][Main][INFO] - [train] Step 10300 out of 25000 | Loss --> 0.578 | Grad_l2 --> 0.179 | Weights_l2 --> 47171.958 | Lr --> 0.000 | Seconds_per_step --> 1.714 | [2024-08-07 09:22:01,651][Main][INFO] - [train] Step 10350 out of 25000 | Loss --> 0.568 | Grad_l2 --> 0.149 | Weights_l2 --> 47171.821 | Lr --> 0.000 | Seconds_per_step --> 1.715 | [2024-08-07 09:23:30,262][Main][INFO] - [train] Step 10400 out of 25000 | Loss --> 0.577 | Grad_l2 --> 0.149 | Weights_l2 --> 47171.684 | Lr --> 0.000 | Seconds_per_step --> 1.772 | [2024-08-07 09:24:55,846][Main][INFO] - [train] Step 10450 out of 25000 | Loss --> 0.577 | Grad_l2 --> 0.197 | Weights_l2 --> 47171.546 | Lr --> 0.000 | Seconds_per_step --> 1.712 | [2024-08-07 09:26:21,432][Main][INFO] - [train] Step 10500 out of 25000 | Loss --> 0.595 | Grad_l2 --> 0.148 | Weights_l2 --> 47171.409 | Lr --> 0.000 | Seconds_per_step --> 1.712 | [2024-08-07 09:27:49,918][Main][INFO] - [train] Step 10550 out of 25000 | Loss --> 0.599 | Grad_l2 --> 0.181 | Weights_l2 --> 47171.272 | Lr --> 0.000 | Seconds_per_step --> 1.770 | [2024-08-07 09:29:16,097][Main][INFO] - [train] Step 10600 out of 25000 | Loss --> 0.578 | Grad_l2 --> 0.148 | Weights_l2 --> 47171.135 | Lr --> 0.000 | Seconds_per_step --> 1.724 | [2024-08-07 09:30:42,100][Main][INFO] - [train] Step 10650 out of 25000 | Loss --> 0.608 | Grad_l2 --> 0.156 | Weights_l2 --> 47170.997 | Lr --> 0.000 | Seconds_per_step --> 1.720 | [2024-08-07 09:32:10,527][Main][INFO] - [train] Step 10700 out of 25000 | Loss --> 0.611 | Grad_l2 --> 0.181 | Weights_l2 --> 47170.860 | Lr --> 0.000 | Seconds_per_step --> 1.769 | [2024-08-07 09:33:36,488][Main][INFO] - [train] Step 10750 out of 25000 | Loss --> 0.615 | Grad_l2 --> 0.146 | Weights_l2 --> 47170.723 | Lr --> 0.000 | Seconds_per_step --> 1.719 | [2024-08-07 09:35:02,437][Main][INFO] - [train] Step 10800 out of 25000 | Loss --> 0.616 | Grad_l2 --> 0.151 | Weights_l2 --> 47170.585 | Lr --> 0.000 | Seconds_per_step --> 1.719 | [2024-08-07 09:36:31,350][Main][INFO] - [train] Step 10850 out of 25000 | Loss --> 0.606 | Grad_l2 --> 0.165 | Weights_l2 --> 47170.448 | Lr --> 0.000 | Seconds_per_step --> 1.778 | [2024-08-07 09:37:57,470][Main][INFO] - [train] Step 10900 out of 25000 | Loss --> 0.575 | Grad_l2 --> 0.139 | Weights_l2 --> 47170.311 | Lr --> 0.000 | Seconds_per_step --> 1.722 | [2024-08-07 09:39:23,644][Main][INFO] - [train] Step 10950 out of 25000 | Loss --> 0.609 | Grad_l2 --> 0.145 | Weights_l2 --> 47170.173 | Lr --> 0.000 | Seconds_per_step --> 1.723 | [2024-08-07 09:40:52,251][Main][INFO] - [train] Step 11000 out of 25000 | Loss --> 0.585 | Grad_l2 --> 0.151 | Weights_l2 --> 47170.036 | Lr --> 0.000 | Seconds_per_step --> 1.772 | [2024-08-07 09:42:18,278][Main][INFO] - [train] Step 11050 out of 25000 | Loss --> 0.627 | Grad_l2 --> 0.149 | Weights_l2 --> 47169.899 | Lr --> 0.000 | Seconds_per_step --> 1.721 | [2024-08-07 09:43:44,271][Main][INFO] - [train] Step 11100 out of 25000 | Loss --> 0.624 | Grad_l2 --> 0.149 | Weights_l2 --> 47169.762 | Lr --> 0.000 | Seconds_per_step --> 1.720 | [2024-08-07 09:45:12,962][Main][INFO] - [train] Step 11150 out of 25000 | Loss --> 0.641 | Grad_l2 --> 0.165 | Weights_l2 --> 47169.620 | Lr --> 0.000 | Seconds_per_step --> 1.774 | [2024-08-07 09:46:38,824][Main][INFO] - [train] Step 11200 out of 25000 | Loss --> 0.633 | Grad_l2 --> 0.160 | Weights_l2 --> 47169.483 | Lr --> 0.000 | Seconds_per_step --> 1.717 | [2024-08-07 09:48:06,984][Main][INFO] - [train] Step 11250 out of 25000 | Loss --> 0.625 | Grad_l2 --> 0.161 | Weights_l2 --> 47169.350 | Lr --> 0.000 | Seconds_per_step --> 1.763 | [2024-08-07 09:49:32,458][Main][INFO] - [train] Step 11300 out of 25000 | Loss --> 0.629 | Grad_l2 --> 0.152 | Weights_l2 --> 47169.212 | Lr --> 0.000 | Seconds_per_step --> 1.709 | [2024-08-07 09:50:58,382][Main][INFO] - [train] Step 11350 out of 25000 | Loss --> 0.636 | Grad_l2 --> 0.150 | Weights_l2 --> 47169.075 | Lr --> 0.000 | Seconds_per_step --> 1.718 | [2024-08-07 09:52:27,146][Main][INFO] - [train] Step 11400 out of 25000 | Loss --> 0.619 | Grad_l2 --> 0.149 | Weights_l2 --> 47168.938 | Lr --> 0.000 | Seconds_per_step --> 1.775 | [2024-08-07 09:53:52,893][Main][INFO] - [train] Step 11450 out of 25000 | Loss --> 0.640 | Grad_l2 --> 0.152 | Weights_l2 --> 47168.800 | Lr --> 0.000 | Seconds_per_step --> 1.715 | [2024-08-07 09:55:18,641][Main][INFO] - [train] Step 11500 out of 25000 | Loss --> 0.636 | Grad_l2 --> 0.188 | Weights_l2 --> 47168.663 | Lr --> 0.000 | Seconds_per_step --> 1.715 | [2024-08-07 09:56:47,125][Main][INFO] - [train] Step 11550 out of 25000 | Loss --> 0.628 | Grad_l2 --> 0.188 | Weights_l2 --> 47168.526 | Lr --> 0.000 | Seconds_per_step --> 1.770 | [2024-08-07 09:58:13,217][Main][INFO] - [train] Step 11600 out of 25000 | Loss --> 0.655 | Grad_l2 --> 0.161 | Weights_l2 --> 47168.392 | Lr --> 0.000 | Seconds_per_step --> 1.722 | [2024-08-07 09:59:38,994][Main][INFO] - [train] Step 11650 out of 25000 | Loss --> 0.613 | Grad_l2 --> 0.153 | Weights_l2 --> 47168.251 | Lr --> 0.000 | Seconds_per_step --> 1.716 | [2024-08-07 10:01:07,252][Main][INFO] - [train] Step 11700 out of 25000 | Loss --> 0.638 | Grad_l2 --> 0.158 | Weights_l2 --> 47168.114 | Lr --> 0.000 | Seconds_per_step --> 1.765 | [2024-08-07 10:02:32,915][Main][INFO] - [train] Step 11750 out of 25000 | Loss --> 0.633 | Grad_l2 --> 0.147 | Weights_l2 --> 47167.977 | Lr --> 0.000 | Seconds_per_step --> 1.713 | [2024-08-07 10:03:59,142][Main][INFO] - [train] Step 11800 out of 25000 | Loss --> 0.638 | Grad_l2 --> 0.192 | Weights_l2 --> 47167.839 | Lr --> 0.000 | Seconds_per_step --> 1.725 | [2024-08-07 10:05:27,859][Main][INFO] - [train] Step 11850 out of 25000 | Loss --> 0.633 | Grad_l2 --> 0.149 | Weights_l2 --> 47167.702 | Lr --> 0.000 | Seconds_per_step --> 1.774 | [2024-08-07 10:06:53,666][Main][INFO] - [train] Step 11900 out of 25000 | Loss --> 0.632 | Grad_l2 --> 0.145 | Weights_l2 --> 47167.565 | Lr --> 0.000 | Seconds_per_step --> 1.716 | [2024-08-07 10:08:19,233][Main][INFO] - [train] Step 11950 out of 25000 | Loss --> 0.628 | Grad_l2 --> 0.165 | Weights_l2 --> 47167.431 | Lr --> 0.000 | Seconds_per_step --> 1.711 | [2024-08-07 10:09:48,031][Main][INFO] - [train] Step 12000 out of 25000 | Loss --> 0.627 | Grad_l2 --> 0.148 | Weights_l2 --> 47167.294 | Lr --> 0.000 | Seconds_per_step --> 1.776 | [2024-08-07 10:09:52,932][Main][INFO] - [eval] Step 12000 out of 25000 | Loss --> 0.851 | Accuracy --> 0.842 | Time --> 4.898 | [2024-08-07 10:14:26,659][absl][INFO] - Using default tokenizer. [2024-08-07 10:14:27,253][Main][INFO] - [test] Step 12000 out of 25000 | Rougel --> 22.493 | Time --> 274.321 | [2024-08-07 10:15:53,027][Main][INFO] - [train] Step 12050 out of 25000 | Loss --> 0.650 | Grad_l2 --> 0.157 | Weights_l2 --> 47167.157 | Lr --> 0.000 | Seconds_per_step --> 1.715 | [2024-08-07 10:17:18,994][Main][INFO] - [train] Step 12100 out of 25000 | Loss --> 0.639 | Grad_l2 --> 0.173 | Weights_l2 --> 47167.020 | Lr --> 0.000 | Seconds_per_step --> 1.719 | [2024-08-07 10:18:47,538][Main][INFO] - [train] Step 12150 out of 25000 | Loss --> 0.645 | Grad_l2 --> 0.162 | Weights_l2 --> 47166.878 | Lr --> 0.000 | Seconds_per_step --> 1.771 | [2024-08-07 10:20:13,740][Main][INFO] - [train] Step 12200 out of 25000 | Loss --> 0.655 | Grad_l2 --> 0.185 | Weights_l2 --> 47166.741 | Lr --> 0.000 | Seconds_per_step --> 1.724 | [2024-08-07 10:21:39,599][Main][INFO] - [train] Step 12250 out of 25000 | Loss --> 0.659 | Grad_l2 --> 0.154 | Weights_l2 --> 47166.604 | Lr --> 0.000 | Seconds_per_step --> 1.717 | [2024-08-07 10:23:08,120][Main][INFO] - [train] Step 12300 out of 25000 | Loss --> 0.639 | Grad_l2 --> 0.156 | Weights_l2 --> 47166.466 | Lr --> 0.000 | Seconds_per_step --> 1.770 | [2024-08-07 10:24:33,959][Main][INFO] - [train] Step 12350 out of 25000 | Loss --> 0.622 | Grad_l2 --> 0.145 | Weights_l2 --> 47166.329 | Lr --> 0.000 | Seconds_per_step --> 1.717 | [2024-08-07 10:26:00,262][Main][INFO] - [train] Step 12400 out of 25000 | Loss --> 0.653 | Grad_l2 --> 0.155 | Weights_l2 --> 47166.192 | Lr --> 0.000 | Seconds_per_step --> 1.726 | [2024-08-07 10:27:28,963][Main][INFO] - [train] Step 12450 out of 25000 | Loss --> 0.633 | Grad_l2 --> 0.160 | Weights_l2 --> 47166.058 | Lr --> 0.000 | Seconds_per_step --> 1.774 | [2024-08-07 10:28:55,101][Main][INFO] - [train] Step 12500 out of 25000 | Loss --> 0.626 | Grad_l2 --> 0.142 | Weights_l2 --> 47165.921 | Lr --> 0.000 | Seconds_per_step --> 1.723 | [2024-08-07 10:30:20,881][Main][INFO] - [train] Step 12550 out of 25000 | Loss --> 0.618 | Grad_l2 --> 0.147 | Weights_l2 --> 47165.784 | Lr --> 0.000 | Seconds_per_step --> 1.716 | [2024-08-07 10:31:49,106][Main][INFO] - [train] Step 12600 out of 25000 | Loss --> 0.658 | Grad_l2 --> 0.189 | Weights_l2 --> 47165.647 | Lr --> 0.000 | Seconds_per_step --> 1.765 | [2024-08-07 10:33:15,133][Main][INFO] - [train] Step 12650 out of 25000 | Loss --> 0.651 | Grad_l2 --> 0.150 | Weights_l2 --> 47165.509 | Lr --> 0.000 | Seconds_per_step --> 1.721 | [2024-08-07 10:34:44,001][Main][INFO] - [train] Step 12700 out of 25000 | Loss --> 0.631 | Grad_l2 --> 0.153 | Weights_l2 --> 47165.368 | Lr --> 0.000 | Seconds_per_step --> 1.777 | [2024-08-07 10:36:10,488][Main][INFO] - [train] Step 12750 out of 25000 | Loss --> 0.616 | Grad_l2 --> 0.151 | Weights_l2 --> 47165.231 | Lr --> 0.000 | Seconds_per_step --> 1.730 | [2024-08-07 10:37:45,117][Main][INFO] - [train] Step 12800 out of 25000 | Loss --> 0.660 | Grad_l2 --> 0.154 | Weights_l2 --> 47165.097 | Lr --> 0.000 | Seconds_per_step --> 1.893 | [2024-08-07 10:39:14,106][Main][INFO] - [train] Step 12850 out of 25000 | Loss --> 0.648 | Grad_l2 --> 0.147 | Weights_l2 --> 47164.960 | Lr --> 0.000 | Seconds_per_step --> 1.780 | [2024-08-07 10:40:39,804][Main][INFO] - [train] Step 12900 out of 25000 | Loss --> 0.623 | Grad_l2 --> 0.144 | Weights_l2 --> 47164.823 | Lr --> 0.000 | Seconds_per_step --> 1.714 | [2024-08-07 10:42:05,250][Main][INFO] - [train] Step 12950 out of 25000 | Loss --> 0.641 | Grad_l2 --> 0.165 | Weights_l2 --> 47164.685 | Lr --> 0.000 | Seconds_per_step --> 1.709 | [2024-08-07 10:43:33,303][Main][INFO] - [train] Step 13000 out of 25000 | Loss --> 0.659 | Grad_l2 --> 0.148 | Weights_l2 --> 47164.548 | Lr --> 0.000 | Seconds_per_step --> 1.761 | [2024-08-07 10:44:59,507][Main][INFO] - [train] Step 13050 out of 25000 | Loss --> 0.657 | Grad_l2 --> 0.224 | Weights_l2 --> 47164.411 | Lr --> 0.000 | Seconds_per_step --> 1.724 | [2024-08-07 10:46:25,581][Main][INFO] - [train] Step 13100 out of 25000 | Loss --> 0.668 | Grad_l2 --> 0.163 | Weights_l2 --> 47164.273 | Lr --> 0.000 | Seconds_per_step --> 1.721 | [2024-08-07 10:47:53,904][Main][INFO] - [train] Step 13150 out of 25000 | Loss --> 0.671 | Grad_l2 --> 0.175 | Weights_l2 --> 47164.136 | Lr --> 0.000 | Seconds_per_step --> 1.766 | [2024-08-07 10:49:19,864][Main][INFO] - [train] Step 13200 out of 25000 | Loss --> 0.666 | Grad_l2 --> 0.158 | Weights_l2 --> 47163.999 | Lr --> 0.000 | Seconds_per_step --> 1.719 | [2024-08-07 10:50:45,992][Main][INFO] - [train] Step 13250 out of 25000 | Loss --> 0.681 | Grad_l2 --> 0.158 | Weights_l2 --> 47163.862 | Lr --> 0.000 | Seconds_per_step --> 1.723 | [2024-08-07 10:52:15,654][Main][INFO] - [train] Step 13300 out of 25000 | Loss --> 0.653 | Grad_l2 --> 0.167 | Weights_l2 --> 47163.728 | Lr --> 0.000 | Seconds_per_step --> 1.793 | [2024-08-07 10:53:41,694][Main][INFO] - [train] Step 13350 out of 25000 | Loss --> 0.651 | Grad_l2 --> 0.154 | Weights_l2 --> 47163.587 | Lr --> 0.000 | Seconds_per_step --> 1.721 | [2024-08-07 10:55:07,494][Main][INFO] - [train] Step 13400 out of 25000 | Loss --> 0.683 | Grad_l2 --> 0.161 | Weights_l2 --> 47163.450 | Lr --> 0.000 | Seconds_per_step --> 1.716 | [2024-08-07 10:56:35,968][Main][INFO] - [train] Step 13450 out of 25000 | Loss --> 0.666 | Grad_l2 --> 0.168 | Weights_l2 --> 47163.312 | Lr --> 0.000 | Seconds_per_step --> 1.769 | [2024-08-07 10:58:01,621][Main][INFO] - [train] Step 13500 out of 25000 | Loss --> 0.650 | Grad_l2 --> 0.158 | Weights_l2 --> 47163.175 | Lr --> 0.000 | Seconds_per_step --> 1.713 | [2024-08-07 10:59:27,354][Main][INFO] - [train] Step 13550 out of 25000 | Loss --> 0.691 | Grad_l2 --> 0.171 | Weights_l2 --> 47163.038 | Lr --> 0.000 | Seconds_per_step --> 1.715 | [2024-08-07 11:00:55,576][Main][INFO] - [train] Step 13600 out of 25000 | Loss --> 0.678 | Grad_l2 --> 0.177 | Weights_l2 --> 47162.901 | Lr --> 0.000 | Seconds_per_step --> 1.764 | [2024-08-07 11:02:21,441][Main][INFO] - [train] Step 13650 out of 25000 | Loss --> 0.711 | Grad_l2 --> 0.170 | Weights_l2 --> 47162.763 | Lr --> 0.000 | Seconds_per_step --> 1.717 | [2024-08-07 11:03:47,588][Main][INFO] - [train] Step 13700 out of 25000 | Loss --> 0.696 | Grad_l2 --> 0.170 | Weights_l2 --> 47162.630 | Lr --> 0.000 | Seconds_per_step --> 1.723 | [2024-08-07 11:05:16,219][Main][INFO] - [train] Step 13750 out of 25000 | Loss --> 0.711 | Grad_l2 --> 0.163 | Weights_l2 --> 47162.493 | Lr --> 0.000 | Seconds_per_step --> 1.773 | [2024-08-07 11:06:41,706][Main][INFO] - [train] Step 13800 out of 25000 | Loss --> 0.715 | Grad_l2 --> 0.163 | Weights_l2 --> 47162.352 | Lr --> 0.000 | Seconds_per_step --> 1.710 | [2024-08-07 11:08:07,195][Main][INFO] - [train] Step 13850 out of 25000 | Loss --> 0.694 | Grad_l2 --> 0.164 | Weights_l2 --> 47162.218 | Lr --> 0.000 | Seconds_per_step --> 1.710 | [2024-08-07 11:09:35,388][Main][INFO] - [train] Step 13900 out of 25000 | Loss --> 0.680 | Grad_l2 --> 0.152 | Weights_l2 --> 47162.081 | Lr --> 0.000 | Seconds_per_step --> 1.764 | [2024-08-07 11:11:00,908][Main][INFO] - [train] Step 13950 out of 25000 | Loss --> 0.696 | Grad_l2 --> 0.168 | Weights_l2 --> 47161.944 | Lr --> 0.000 | Seconds_per_step --> 1.710 | [2024-08-07 11:12:26,729][Main][INFO] - [train] Step 14000 out of 25000 | Loss --> 0.711 | Grad_l2 --> 0.157 | Weights_l2 --> 47161.806 | Lr --> 0.000 | Seconds_per_step --> 1.716 | [2024-08-07 11:13:55,581][Main][INFO] - [train] Step 14050 out of 25000 | Loss --> 0.682 | Grad_l2 --> 0.166 | Weights_l2 --> 47161.669 | Lr --> 0.000 | Seconds_per_step --> 1.777 | [2024-08-07 11:15:21,781][Main][INFO] - [train] Step 14100 out of 25000 | Loss --> 0.709 | Grad_l2 --> 0.158 | Weights_l2 --> 47161.532 | Lr --> 0.000 | Seconds_per_step --> 1.724 | [2024-08-07 11:16:47,977][Main][INFO] - [train] Step 14150 out of 25000 | Loss --> 0.733 | Grad_l2 --> 0.160 | Weights_l2 --> 47161.394 | Lr --> 0.000 | Seconds_per_step --> 1.724 | [2024-08-07 11:18:16,612][Main][INFO] - [train] Step 14200 out of 25000 | Loss --> 0.728 | Grad_l2 --> 0.168 | Weights_l2 --> 47161.257 | Lr --> 0.000 | Seconds_per_step --> 1.773 | [2024-08-07 11:19:42,684][Main][INFO] - [train] Step 14250 out of 25000 | Loss --> 0.707 | Grad_l2 --> 0.160 | Weights_l2 --> 47161.120 | Lr --> 0.000 | Seconds_per_step --> 1.721 | [2024-08-07 11:21:08,147][Main][INFO] - [train] Step 14300 out of 25000 | Loss --> 0.717 | Grad_l2 --> 0.156 | Weights_l2 --> 47160.982 | Lr --> 0.000 | Seconds_per_step --> 1.709 | [2024-08-07 11:22:36,306][Main][INFO] - [train] Step 14350 out of 25000 | Loss --> 0.712 | Grad_l2 --> 0.157 | Weights_l2 --> 47160.845 | Lr --> 0.000 | Seconds_per_step --> 1.763 | [2024-08-07 11:24:01,811][Main][INFO] - [train] Step 14400 out of 25000 | Loss --> 0.720 | Grad_l2 --> 0.164 | Weights_l2 --> 47160.708 | Lr --> 0.000 | Seconds_per_step --> 1.710 | [2024-08-07 11:25:27,346][Main][INFO] - [train] Step 14450 out of 25000 | Loss --> 0.706 | Grad_l2 --> 0.160 | Weights_l2 --> 47160.570 | Lr --> 0.000 | Seconds_per_step --> 1.711 | [2024-08-07 11:26:55,597][Main][INFO] - [train] Step 14500 out of 25000 | Loss --> 0.731 | Grad_l2 --> 0.159 | Weights_l2 --> 47160.433 | Lr --> 0.000 | Seconds_per_step --> 1.765 | [2024-08-07 11:28:21,073][Main][INFO] - [train] Step 14550 out of 25000 | Loss --> 0.713 | Grad_l2 --> 0.159 | Weights_l2 --> 47160.296 | Lr --> 0.000 | Seconds_per_step --> 1.710 | [2024-08-07 11:29:46,549][Main][INFO] - [train] Step 14600 out of 25000 | Loss --> 0.712 | Grad_l2 --> 0.170 | Weights_l2 --> 47160.155 | Lr --> 0.000 | Seconds_per_step --> 1.710 | [2024-08-07 11:31:14,793][Main][INFO] - [train] Step 14650 out of 25000 | Loss --> 0.732 | Grad_l2 --> 0.156 | Weights_l2 --> 47160.021 | Lr --> 0.000 | Seconds_per_step --> 1.765 | [2024-08-07 11:32:40,288][Main][INFO] - [train] Step 14700 out of 25000 | Loss --> 0.725 | Grad_l2 --> 0.154 | Weights_l2 --> 47159.884 | Lr --> 0.000 | Seconds_per_step --> 1.710 | [2024-08-07 11:34:05,850][Main][INFO] - [train] Step 14750 out of 25000 | Loss --> 0.689 | Grad_l2 --> 0.147 | Weights_l2 --> 47159.747 | Lr --> 0.000 | Seconds_per_step --> 1.711 | [2024-08-07 11:35:34,608][Main][INFO] - [train] Step 14800 out of 25000 | Loss --> 0.705 | Grad_l2 --> 0.170 | Weights_l2 --> 47159.609 | Lr --> 0.000 | Seconds_per_step --> 1.775 | [2024-08-07 11:37:00,181][Main][INFO] - [train] Step 14850 out of 25000 | Loss --> 0.722 | Grad_l2 --> 0.165 | Weights_l2 --> 47159.472 | Lr --> 0.000 | Seconds_per_step --> 1.711 | [2024-08-07 11:38:25,726][Main][INFO] - [train] Step 14900 out of 25000 | Loss --> 0.696 | Grad_l2 --> 0.174 | Weights_l2 --> 47159.335 | Lr --> 0.000 | Seconds_per_step --> 1.711 | [2024-08-07 11:39:53,934][Main][INFO] - [train] Step 14950 out of 25000 | Loss --> 0.732 | Grad_l2 --> 0.180 | Weights_l2 --> 47159.198 | Lr --> 0.000 | Seconds_per_step --> 1.764 | [2024-08-07 11:41:19,559][Main][INFO] - [train] Step 15000 out of 25000 | Loss --> 0.705 | Grad_l2 --> 0.173 | Weights_l2 --> 47159.060 | Lr --> 0.000 | Seconds_per_step --> 1.712 | [2024-08-07 11:41:19,560][accelerate.accelerator][INFO] - Saving current state to checkpoint-ft-15000 [2024-08-07 11:41:19,566][accelerate.utils.other][WARNING] - Removed shared tensor {'encoder.embed_tokens.weight', 'lm_head.weight', 'decoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading [2024-08-07 11:41:20,382][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-ft-15000/model.safetensors [2024-08-07 11:41:21,526][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-ft-15000/optimizer.bin [2024-08-07 11:41:21,527][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-ft-15000/scheduler.bin [2024-08-07 11:41:21,527][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-ft-15000/sampler.bin [2024-08-07 11:41:21,527][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-ft-15000/sampler_1.bin [2024-08-07 11:41:21,528][accelerate.checkpointing][INFO] - Random states saved in checkpoint-ft-15000/random_states_0.pkl [2024-08-07 11:42:47,021][Main][INFO] - [train] Step 15050 out of 25000 | Loss --> 0.707 | Grad_l2 --> 0.173 | Weights_l2 --> 47158.927 | Lr --> 0.000 | Seconds_per_step --> 1.749 | [2024-08-07 11:44:15,207][Main][INFO] - [train] Step 15100 out of 25000 | Loss --> 0.731 | Grad_l2 --> 0.153 | Weights_l2 --> 47158.786 | Lr --> 0.000 | Seconds_per_step --> 1.764 | [2024-08-07 11:45:41,007][Main][INFO] - [train] Step 15150 out of 25000 | Loss --> 0.724 | Grad_l2 --> 0.166 | Weights_l2 --> 47158.648 | Lr --> 0.000 | Seconds_per_step --> 1.716 | [2024-08-07 11:47:07,191][Main][INFO] - [train] Step 15200 out of 25000 | Loss --> 0.738 | Grad_l2 --> 0.170 | Weights_l2 --> 47158.515 | Lr --> 0.000 | Seconds_per_step --> 1.724 | [2024-08-07 11:48:35,506][Main][INFO] - [train] Step 15250 out of 25000 | Loss --> 0.715 | Grad_l2 --> 0.158 | Weights_l2 --> 47158.378 | Lr --> 0.000 | Seconds_per_step --> 1.766 | [2024-08-07 11:50:01,669][Main][INFO] - [train] Step 15300 out of 25000 | Loss --> 0.733 | Grad_l2 --> 0.166 | Weights_l2 --> 47158.237 | Lr --> 0.000 | Seconds_per_step --> 1.723 | [2024-08-07 11:51:27,955][Main][INFO] - [train] Step 15350 out of 25000 | Loss --> 0.725 | Grad_l2 --> 0.172 | Weights_l2 --> 47158.099 | Lr --> 0.000 | Seconds_per_step --> 1.726 | [2024-08-07 11:52:56,776][Main][INFO] - [train] Step 15400 out of 25000 | Loss --> 0.708 | Grad_l2 --> 0.151 | Weights_l2 --> 47157.962 | Lr --> 0.000 | Seconds_per_step --> 1.776 | [2024-08-07 11:54:22,548][Main][INFO] - [train] Step 15450 out of 25000 | Loss --> 0.750 | Grad_l2 --> 0.157 | Weights_l2 --> 47157.825 | Lr --> 0.000 | Seconds_per_step --> 1.715 | [2024-08-07 11:55:48,808][Main][INFO] - [train] Step 15500 out of 25000 | Loss --> 0.766 | Grad_l2 --> 0.171 | Weights_l2 --> 47157.687 | Lr --> 0.000 | Seconds_per_step --> 1.725 | [2024-08-07 11:57:17,585][Main][INFO] - [train] Step 15550 out of 25000 | Loss --> 0.734 | Grad_l2 --> 0.169 | Weights_l2 --> 47157.554 | Lr --> 0.000 | Seconds_per_step --> 1.776 | [2024-08-07 11:58:43,740][Main][INFO] - [train] Step 15600 out of 25000 | Loss --> 0.714 | Grad_l2 --> 0.155 | Weights_l2 --> 47157.417 | Lr --> 0.000 | Seconds_per_step --> 1.723 | [2024-08-07 12:00:12,386][Main][INFO] - [train] Step 15650 out of 25000 | Loss --> 0.745 | Grad_l2 --> 0.169 | Weights_l2 --> 47157.279 | Lr --> 0.000 | Seconds_per_step --> 1.773 | [2024-08-07 12:01:38,506][Main][INFO] - [train] Step 15700 out of 25000 | Loss --> 0.732 | Grad_l2 --> 0.223 | Weights_l2 --> 47157.142 | Lr --> 0.000 | Seconds_per_step --> 1.722 | [2024-08-07 12:03:04,291][Main][INFO] - [train] Step 15750 out of 25000 | Loss --> 0.718 | Grad_l2 --> 0.161 | Weights_l2 --> 47157.005 | Lr --> 0.000 | Seconds_per_step --> 1.716 | [2024-08-07 12:04:32,531][Main][INFO] - [train] Step 15800 out of 25000 | Loss --> 0.743 | Grad_l2 --> 0.177 | Weights_l2 --> 47156.871 | Lr --> 0.000 | Seconds_per_step --> 1.765 | [2024-08-07 12:05:58,031][Main][INFO] - [train] Step 15850 out of 25000 | Loss --> 0.754 | Grad_l2 --> 0.167 | Weights_l2 --> 47156.730 | Lr --> 0.000 | Seconds_per_step --> 1.710 | [2024-08-07 12:07:23,569][Main][INFO] - [train] Step 15900 out of 25000 | Loss --> 0.730 | Grad_l2 --> 0.200 | Weights_l2 --> 47156.593 | Lr --> 0.000 | Seconds_per_step --> 1.711 | [2024-08-07 12:08:51,789][Main][INFO] - [train] Step 15950 out of 25000 | Loss --> 0.750 | Grad_l2 --> 0.196 | Weights_l2 --> 47156.456 | Lr --> 0.000 | Seconds_per_step --> 1.764 | [2024-08-07 12:10:17,593][Main][INFO] - [train] Step 16000 out of 25000 | Loss --> 0.734 | Grad_l2 --> 0.158 | Weights_l2 --> 47156.318 | Lr --> 0.000 | Seconds_per_step --> 1.716 | [2024-08-07 12:10:22,535][Main][INFO] - [eval] Step 16000 out of 25000 | Loss --> 0.830 | Accuracy --> 0.846 | Time --> 4.939 | [2024-08-07 12:14:42,758][absl][INFO] - Using default tokenizer. [2024-08-07 12:14:43,321][Main][INFO] - [test] Step 16000 out of 25000 | Rougel --> 24.234 | Time --> 260.785 | [2024-08-07 12:16:09,102][Main][INFO] - [train] Step 16050 out of 25000 | Loss --> 0.724 | Grad_l2 --> 0.153 | Weights_l2 --> 47156.181 | Lr --> 0.000 | Seconds_per_step --> 1.716 | [2024-08-07 12:17:38,179][Main][INFO] - [train] Step 16100 out of 25000 | Loss --> 0.746 | Grad_l2 --> 0.173 | Weights_l2 --> 47156.044 | Lr --> 0.000 | Seconds_per_step --> 1.782 | [2024-08-07 12:19:04,016][Main][INFO] - [train] Step 16150 out of 25000 | Loss --> 0.746 | Grad_l2 --> 0.207 | Weights_l2 --> 47155.906 | Lr --> 0.000 | Seconds_per_step --> 1.717 | [2024-08-07 12:20:29,950][Main][INFO] - [train] Step 16200 out of 25000 | Loss --> 0.749 | Grad_l2 --> 0.155 | Weights_l2 --> 47155.769 | Lr --> 0.000 | Seconds_per_step --> 1.719 | [2024-08-07 12:21:58,460][Main][INFO] - [train] Step 16250 out of 25000 | Loss --> 0.730 | Grad_l2 --> 0.171 | Weights_l2 --> 47155.632 | Lr --> 0.000 | Seconds_per_step --> 1.770 | [2024-08-07 12:23:24,644][Main][INFO] - [train] Step 16300 out of 25000 | Loss --> 0.738 | Grad_l2 --> 0.172 | Weights_l2 --> 47155.495 | Lr --> 0.000 | Seconds_per_step --> 1.724 | [2024-08-07 12:24:50,811][Main][INFO] - [train] Step 16350 out of 25000 | Loss --> 0.726 | Grad_l2 --> 0.159 | Weights_l2 --> 47155.357 | Lr --> 0.000 | Seconds_per_step --> 1.723 | [2024-08-07 12:26:18,994][Main][INFO] - [train] Step 16400 out of 25000 | Loss --> 0.726 | Grad_l2 --> 0.155 | Weights_l2 --> 47155.224 | Lr --> 0.000 | Seconds_per_step --> 1.764 | [2024-08-07 12:27:44,559][Main][INFO] - [train] Step 16450 out of 25000 | Loss --> 0.721 | Grad_l2 --> 0.169 | Weights_l2 --> 47155.086 | Lr --> 0.000 | Seconds_per_step --> 1.711 | [2024-08-07 12:29:10,639][Main][INFO] - [train] Step 16500 out of 25000 | Loss --> 0.733 | Grad_l2 --> 0.174 | Weights_l2 --> 47154.949 | Lr --> 0.000 | Seconds_per_step --> 1.722 | [2024-08-07 12:30:38,659][Main][INFO] - [train] Step 16550 out of 25000 | Loss --> 0.712 | Grad_l2 --> 0.174 | Weights_l2 --> 47154.812 | Lr --> 0.000 | Seconds_per_step --> 1.760 | [2024-08-07 12:32:04,163][Main][INFO] - [train] Step 16600 out of 25000 | Loss --> 0.718 | Grad_l2 --> 0.152 | Weights_l2 --> 47154.675 | Lr --> 0.000 | Seconds_per_step --> 1.710 | [2024-08-07 12:33:29,664][Main][INFO] - [train] Step 16650 out of 25000 | Loss --> 0.734 | Grad_l2 --> 0.196 | Weights_l2 --> 47154.537 | Lr --> 0.000 | Seconds_per_step --> 1.710 | [2024-08-07 12:34:57,619][Main][INFO] - [train] Step 16700 out of 25000 | Loss --> 0.745 | Grad_l2 --> 0.158 | Weights_l2 --> 47154.400 | Lr --> 0.000 | Seconds_per_step --> 1.759 | [2024-08-07 12:36:23,705][Main][INFO] - [train] Step 16750 out of 25000 | Loss --> 0.754 | Grad_l2 --> 0.228 | Weights_l2 --> 47154.263 | Lr --> 0.000 | Seconds_per_step --> 1.722 | [2024-08-07 12:37:51,699][Main][INFO] - [train] Step 16800 out of 25000 | Loss --> 0.744 | Grad_l2 --> 0.185 | Weights_l2 --> 47154.126 | Lr --> 0.000 | Seconds_per_step --> 1.760 | [2024-08-07 12:39:17,171][Main][INFO] - [train] Step 16850 out of 25000 | Loss --> 0.742 | Grad_l2 --> 0.172 | Weights_l2 --> 47153.988 | Lr --> 0.000 | Seconds_per_step --> 1.709 | [2024-08-07 12:40:43,000][Main][INFO] - [train] Step 16900 out of 25000 | Loss --> 0.736 | Grad_l2 --> 0.216 | Weights_l2 --> 47153.851 | Lr --> 0.000 | Seconds_per_step --> 1.717 | [2024-08-07 12:42:11,725][Main][INFO] - [train] Step 16950 out of 25000 | Loss --> 0.737 | Grad_l2 --> 0.212 | Weights_l2 --> 47153.714 | Lr --> 0.000 | Seconds_per_step --> 1.774 | [2024-08-07 12:43:38,101][Main][INFO] - [train] Step 17000 out of 25000 | Loss --> 0.737 | Grad_l2 --> 0.188 | Weights_l2 --> 47153.576 | Lr --> 0.000 | Seconds_per_step --> 1.728 | [2024-08-07 12:45:04,080][Main][INFO] - [train] Step 17050 out of 25000 | Loss --> 0.753 | Grad_l2 --> 0.337 | Weights_l2 --> 47153.439 | Lr --> 0.000 | Seconds_per_step --> 1.720 | [2024-08-07 12:46:32,474][Main][INFO] - [train] Step 17100 out of 25000 | Loss --> 0.720 | Grad_l2 --> 0.165 | Weights_l2 --> 47153.302 | Lr --> 0.000 | Seconds_per_step --> 1.768 | [2024-08-07 12:47:58,632][Main][INFO] - [train] Step 17150 out of 25000 | Loss --> 0.735 | Grad_l2 --> 0.162 | Weights_l2 --> 47153.165 | Lr --> 0.000 | Seconds_per_step --> 1.723 | [2024-08-07 12:49:25,027][Main][INFO] - [train] Step 17200 out of 25000 | Loss --> 0.742 | Grad_l2 --> 0.167 | Weights_l2 --> 47153.031 | Lr --> 0.000 | Seconds_per_step --> 1.728 | [2024-08-07 12:50:53,829][Main][INFO] - [train] Step 17250 out of 25000 | Loss --> 0.732 | Grad_l2 --> 0.166 | Weights_l2 --> 47152.894 | Lr --> 0.000 | Seconds_per_step --> 1.776 | [2024-08-07 12:52:19,460][Main][INFO] - [train] Step 17300 out of 25000 | Loss --> 0.710 | Grad_l2 --> 0.151 | Weights_l2 --> 47152.757 | Lr --> 0.000 | Seconds_per_step --> 1.713 | [2024-08-07 12:53:45,284][Main][INFO] - [train] Step 17350 out of 25000 | Loss --> 0.748 | Grad_l2 --> 0.155 | Weights_l2 --> 47152.619 | Lr --> 0.000 | Seconds_per_step --> 1.716 | [2024-08-07 12:55:14,159][Main][INFO] - [train] Step 17400 out of 25000 | Loss --> 0.708 | Grad_l2 --> 0.151 | Weights_l2 --> 47152.482 | Lr --> 0.000 | Seconds_per_step --> 1.777 | [2024-08-07 12:56:40,014][Main][INFO] - [train] Step 17450 out of 25000 | Loss --> 0.739 | Grad_l2 --> 0.181 | Weights_l2 --> 47152.345 | Lr --> 0.000 | Seconds_per_step --> 1.717 | [2024-08-07 12:58:05,500][Main][INFO] - [train] Step 17500 out of 25000 | Loss --> 0.724 | Grad_l2 --> 0.196 | Weights_l2 --> 47152.211 | Lr --> 0.000 | Seconds_per_step --> 1.710 | [2024-08-07 12:59:33,742][Main][INFO] - [train] Step 17550 out of 25000 | Loss --> 0.705 | Grad_l2 --> 0.174 | Weights_l2 --> 47152.070 | Lr --> 0.000 | Seconds_per_step --> 1.765 | [2024-08-07 13:00:59,255][Main][INFO] - [train] Step 17600 out of 25000 | Loss --> 0.726 | Grad_l2 --> 0.167 | Weights_l2 --> 47151.933 | Lr --> 0.000 | Seconds_per_step --> 1.710 | [2024-08-07 13:02:24,974][Main][INFO] - [train] Step 17650 out of 25000 | Loss --> 0.733 | Grad_l2 --> 0.185 | Weights_l2 --> 47151.796 | Lr --> 0.000 | Seconds_per_step --> 1.714 | [2024-08-07 13:03:53,834][Main][INFO] - [train] Step 17700 out of 25000 | Loss --> 0.753 | Grad_l2 --> 0.169 | Weights_l2 --> 47151.658 | Lr --> 0.000 | Seconds_per_step --> 1.777 | [2024-08-07 13:05:19,816][Main][INFO] - [train] Step 17750 out of 25000 | Loss --> 0.720 | Grad_l2 --> 0.179 | Weights_l2 --> 47151.521 | Lr --> 0.000 | Seconds_per_step --> 1.720 | [2024-08-07 13:06:48,589][Main][INFO] - [train] Step 17800 out of 25000 | Loss --> 0.724 | Grad_l2 --> 0.169 | Weights_l2 --> 47151.384 | Lr --> 0.000 | Seconds_per_step --> 1.775 | [2024-08-07 13:08:14,874][Main][INFO] - [train] Step 17850 out of 25000 | Loss --> 0.728 | Grad_l2 --> 0.185 | Weights_l2 --> 47151.246 | Lr --> 0.000 | Seconds_per_step --> 1.726 | [2024-08-07 13:09:40,853][Main][INFO] - [train] Step 17900 out of 25000 | Loss --> 0.706 | Grad_l2 --> 0.161 | Weights_l2 --> 47151.109 | Lr --> 0.000 | Seconds_per_step --> 1.720 | [2024-08-07 13:11:08,861][Main][INFO] - [train] Step 17950 out of 25000 | Loss --> 0.751 | Grad_l2 --> 0.161 | Weights_l2 --> 47150.972 | Lr --> 0.000 | Seconds_per_step --> 1.760 | [2024-08-07 13:12:34,980][Main][INFO] - [train] Step 18000 out of 25000 | Loss --> 0.745 | Grad_l2 --> 0.257 | Weights_l2 --> 47150.835 | Lr --> 0.000 | Seconds_per_step --> 1.722 | [2024-08-07 13:14:01,362][Main][INFO] - [train] Step 18050 out of 25000 | Loss --> 0.760 | Grad_l2 --> 0.166 | Weights_l2 --> 47150.701 | Lr --> 0.000 | Seconds_per_step --> 1.728 | [2024-08-07 13:15:29,325][Main][INFO] - [train] Step 18100 out of 25000 | Loss --> 0.724 | Grad_l2 --> 0.159 | Weights_l2 --> 47150.564 | Lr --> 0.000 | Seconds_per_step --> 1.759 | [2024-08-07 13:16:54,802][Main][INFO] - [train] Step 18150 out of 25000 | Loss --> 0.733 | Grad_l2 --> 0.163 | Weights_l2 --> 47150.427 | Lr --> 0.000 | Seconds_per_step --> 1.710 | [2024-08-07 13:18:20,272][Main][INFO] - [train] Step 18200 out of 25000 | Loss --> 0.723 | Grad_l2 --> 0.157 | Weights_l2 --> 47150.289 | Lr --> 0.000 | Seconds_per_step --> 1.709 | [2024-08-07 13:19:48,874][Main][INFO] - [train] Step 18250 out of 25000 | Loss --> 0.738 | Grad_l2 --> 0.186 | Weights_l2 --> 47150.152 | Lr --> 0.000 | Seconds_per_step --> 1.772 | [2024-08-07 13:21:14,606][Main][INFO] - [train] Step 18300 out of 25000 | Loss --> 0.750 | Grad_l2 --> 0.155 | Weights_l2 --> 47150.015 | Lr --> 0.000 | Seconds_per_step --> 1.715 | [2024-08-07 13:22:40,318][Main][INFO] - [train] Step 18350 out of 25000 | Loss --> 0.732 | Grad_l2 --> 0.151 | Weights_l2 --> 47149.878 | Lr --> 0.000 | Seconds_per_step --> 1.714 | [2024-08-07 13:24:08,741][Main][INFO] - [train] Step 18400 out of 25000 | Loss --> 0.716 | Grad_l2 --> 0.172 | Weights_l2 --> 47149.740 | Lr --> 0.000 | Seconds_per_step --> 1.768 | [2024-08-07 13:25:34,410][Main][INFO] - [train] Step 18450 out of 25000 | Loss --> 0.720 | Grad_l2 --> 0.164 | Weights_l2 --> 47149.603 | Lr --> 0.000 | Seconds_per_step --> 1.713 | [2024-08-07 13:27:00,745][Main][INFO] - [train] Step 18500 out of 25000 | Loss --> 0.731 | Grad_l2 --> 0.156 | Weights_l2 --> 47149.466 | Lr --> 0.000 | Seconds_per_step --> 1.727 | [2024-08-07 13:28:29,717][Main][INFO] - [train] Step 18550 out of 25000 | Loss --> 0.723 | Grad_l2 --> 0.153 | Weights_l2 --> 47149.328 | Lr --> 0.000 | Seconds_per_step --> 1.779 | [2024-08-07 13:29:55,948][Main][INFO] - [train] Step 18600 out of 25000 | Loss --> 0.709 | Grad_l2 --> 0.167 | Weights_l2 --> 47149.191 | Lr --> 0.000 | Seconds_per_step --> 1.725 | [2024-08-07 13:31:22,096][Main][INFO] - [train] Step 18650 out of 25000 | Loss --> 0.720 | Grad_l2 --> 0.162 | Weights_l2 --> 47149.054 | Lr --> 0.000 | Seconds_per_step --> 1.723 | [2024-08-07 13:32:50,690][Main][INFO] - [train] Step 18700 out of 25000 | Loss --> 0.712 | Grad_l2 --> 0.182 | Weights_l2 --> 47148.916 | Lr --> 0.000 | Seconds_per_step --> 1.772 | [2024-08-07 13:34:16,757][Main][INFO] - [train] Step 18750 out of 25000 | Loss --> 0.736 | Grad_l2 --> 0.196 | Weights_l2 --> 47148.779 | Lr --> 0.000 | Seconds_per_step --> 1.721 | [2024-08-07 13:35:42,343][Main][INFO] - [train] Step 18800 out of 25000 | Loss --> 0.720 | Grad_l2 --> 0.199 | Weights_l2 --> 47148.642 | Lr --> 0.000 | Seconds_per_step --> 1.712 | [2024-08-07 13:37:10,652][Main][INFO] - [train] Step 18850 out of 25000 | Loss --> 0.720 | Grad_l2 --> 0.161 | Weights_l2 --> 47148.504 | Lr --> 0.000 | Seconds_per_step --> 1.766 | [2024-08-07 13:38:41,717][Main][INFO] - [train] Step 18900 out of 25000 | Loss --> 0.751 | Grad_l2 --> 0.159 | Weights_l2 --> 47148.367 | Lr --> 0.000 | Seconds_per_step --> 1.821 | [2024-08-07 13:40:07,531][Main][INFO] - [train] Step 18950 out of 25000 | Loss --> 0.726 | Grad_l2 --> 0.186 | Weights_l2 --> 47148.234 | Lr --> 0.000 | Seconds_per_step --> 1.716 | [2024-08-07 13:41:35,943][Main][INFO] - [train] Step 19000 out of 25000 | Loss --> 0.743 | Grad_l2 --> 0.260 | Weights_l2 --> 47148.096 | Lr --> 0.000 | Seconds_per_step --> 1.768 | [2024-08-07 13:43:02,256][Main][INFO] - [train] Step 19050 out of 25000 | Loss --> 0.727 | Grad_l2 --> 0.218 | Weights_l2 --> 47147.959 | Lr --> 0.000 | Seconds_per_step --> 1.726 | [2024-08-07 13:44:28,589][Main][INFO] - [train] Step 19100 out of 25000 | Loss --> 0.697 | Grad_l2 --> 0.156 | Weights_l2 --> 47147.822 | Lr --> 0.000 | Seconds_per_step --> 1.727 | [2024-08-07 13:45:57,171][Main][INFO] - [train] Step 19150 out of 25000 | Loss --> 0.726 | Grad_l2 --> 0.205 | Weights_l2 --> 47147.685 | Lr --> 0.000 | Seconds_per_step --> 1.772 | [2024-08-07 13:47:22,671][Main][INFO] - [train] Step 19200 out of 25000 | Loss --> 0.720 | Grad_l2 --> 0.212 | Weights_l2 --> 47147.547 | Lr --> 0.000 | Seconds_per_step --> 1.710 | [2024-08-07 13:48:50,958][Main][INFO] - [train] Step 19250 out of 25000 | Loss --> 0.702 | Grad_l2 --> 0.176 | Weights_l2 --> 47147.410 | Lr --> 0.000 | Seconds_per_step --> 1.766 | [2024-08-07 13:50:16,489][Main][INFO] - [train] Step 19300 out of 25000 | Loss --> 0.694 | Grad_l2 --> 0.156 | Weights_l2 --> 47147.273 | Lr --> 0.000 | Seconds_per_step --> 1.711 | [2024-08-07 13:51:42,217][Main][INFO] - [train] Step 19350 out of 25000 | Loss --> 0.678 | Grad_l2 --> 0.158 | Weights_l2 --> 47147.135 | Lr --> 0.000 | Seconds_per_step --> 1.715 | [2024-08-07 13:53:10,925][Main][INFO] - [train] Step 19400 out of 25000 | Loss --> 0.708 | Grad_l2 --> 0.167 | Weights_l2 --> 47146.998 | Lr --> 0.000 | Seconds_per_step --> 1.774 | [2024-08-07 13:54:37,089][Main][INFO] - [train] Step 19450 out of 25000 | Loss --> 0.721 | Grad_l2 --> 0.175 | Weights_l2 --> 47146.861 | Lr --> 0.000 | Seconds_per_step --> 1.723 | [2024-08-07 13:56:03,239][Main][INFO] - [train] Step 19500 out of 25000 | Loss --> 0.707 | Grad_l2 --> 0.158 | Weights_l2 --> 47146.723 | Lr --> 0.000 | Seconds_per_step --> 1.723 | [2024-08-07 13:57:31,897][Main][INFO] - [train] Step 19550 out of 25000 | Loss --> 0.684 | Grad_l2 --> 0.158 | Weights_l2 --> 47146.586 | Lr --> 0.000 | Seconds_per_step --> 1.773 | [2024-08-07 13:58:57,731][Main][INFO] - [train] Step 19600 out of 25000 | Loss --> 0.725 | Grad_l2 --> 0.158 | Weights_l2 --> 47146.449 | Lr --> 0.000 | Seconds_per_step --> 1.717 | [2024-08-07 14:00:23,264][Main][INFO] - [train] Step 19650 out of 25000 | Loss --> 0.699 | Grad_l2 --> 0.163 | Weights_l2 --> 47146.312 | Lr --> 0.000 | Seconds_per_step --> 1.711 | [2024-08-07 14:01:51,434][Main][INFO] - [train] Step 19700 out of 25000 | Loss --> 0.723 | Grad_l2 --> 0.169 | Weights_l2 --> 47146.174 | Lr --> 0.000 | Seconds_per_step --> 1.763 | [2024-08-07 14:03:17,033][Main][INFO] - [train] Step 19750 out of 25000 | Loss --> 0.651 | Grad_l2 --> 0.190 | Weights_l2 --> 47146.037 | Lr --> 0.000 | Seconds_per_step --> 1.712 | [2024-08-07 14:04:42,658][Main][INFO] - [train] Step 19800 out of 25000 | Loss --> 0.686 | Grad_l2 --> 0.156 | Weights_l2 --> 47145.900 | Lr --> 0.000 | Seconds_per_step --> 1.712 | [2024-08-07 14:06:10,961][Main][INFO] - [train] Step 19850 out of 25000 | Loss --> 0.687 | Grad_l2 --> 0.160 | Weights_l2 --> 47145.766 | Lr --> 0.000 | Seconds_per_step --> 1.766 | [2024-08-07 14:07:36,890][Main][INFO] - [train] Step 19900 out of 25000 | Loss --> 0.698 | Grad_l2 --> 0.177 | Weights_l2 --> 47145.629 | Lr --> 0.000 | Seconds_per_step --> 1.719 | [2024-08-07 14:09:03,084][Main][INFO] - [train] Step 19950 out of 25000 | Loss --> 0.720 | Grad_l2 --> 0.166 | Weights_l2 --> 47145.492 | Lr --> 0.000 | Seconds_per_step --> 1.724 | [2024-08-07 14:10:31,989][Main][INFO] - [train] Step 20000 out of 25000 | Loss --> 0.668 | Grad_l2 --> 0.157 | Weights_l2 --> 47145.354 | Lr --> 0.000 | Seconds_per_step --> 1.778 | [2024-08-07 14:10:36,920][Main][INFO] - [eval] Step 20000 out of 25000 | Loss --> 0.811 | Accuracy --> 0.849 | Time --> 4.928 | [2024-08-07 14:15:07,226][absl][INFO] - Using default tokenizer. [2024-08-07 14:15:07,807][Main][INFO] - [test] Step 20000 out of 25000 | Rougel --> 25.044 | Time --> 270.886 | [2024-08-07 14:15:07,811][accelerate.accelerator][INFO] - Saving current state to checkpoint-ft-20000 [2024-08-07 14:15:07,819][accelerate.utils.other][WARNING] - Removed shared tensor {'encoder.embed_tokens.weight', 'lm_head.weight', 'decoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading [2024-08-07 14:15:08,650][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-ft-20000/model.safetensors [2024-08-07 14:15:09,813][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-ft-20000/optimizer.bin [2024-08-07 14:15:09,814][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-ft-20000/scheduler.bin [2024-08-07 14:15:09,814][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-ft-20000/sampler.bin [2024-08-07 14:15:09,814][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-ft-20000/sampler_1.bin [2024-08-07 14:15:09,815][accelerate.checkpointing][INFO] - Random states saved in checkpoint-ft-20000/random_states_0.pkl [2024-08-07 14:16:36,010][Main][INFO] - [train] Step 20050 out of 25000 | Loss --> 0.686 | Grad_l2 --> 0.170 | Weights_l2 --> 47145.217 | Lr --> 0.000 | Seconds_per_step --> 1.764 | [2024-08-07 14:18:02,209][Main][INFO] - [train] Step 20100 out of 25000 | Loss --> 0.709 | Grad_l2 --> 0.167 | Weights_l2 --> 47145.080 | Lr --> 0.000 | Seconds_per_step --> 1.724 | [2024-08-07 14:19:30,971][Main][INFO] - [train] Step 20150 out of 25000 | Loss --> 0.682 | Grad_l2 --> 0.173 | Weights_l2 --> 47144.943 | Lr --> 0.000 | Seconds_per_step --> 1.775 | [2024-08-07 14:20:56,968][Main][INFO] - [train] Step 20200 out of 25000 | Loss --> 0.662 | Grad_l2 --> 0.154 | Weights_l2 --> 47144.809 | Lr --> 0.000 | Seconds_per_step --> 1.720 | [2024-08-07 14:22:22,622][Main][INFO] - [train] Step 20250 out of 25000 | Loss --> 0.668 | Grad_l2 --> 0.173 | Weights_l2 --> 47144.668 | Lr --> 0.000 | Seconds_per_step --> 1.713 | [2024-08-07 14:23:50,811][Main][INFO] - [train] Step 20300 out of 25000 | Loss --> 0.648 | Grad_l2 --> 0.173 | Weights_l2 --> 47144.531 | Lr --> 0.000 | Seconds_per_step --> 1.764 | [2024-08-07 14:25:16,348][Main][INFO] - [train] Step 20350 out of 25000 | Loss --> 0.653 | Grad_l2 --> 0.181 | Weights_l2 --> 47144.397 | Lr --> 0.000 | Seconds_per_step --> 1.711 | [2024-08-07 14:26:41,867][Main][INFO] - [train] Step 20400 out of 25000 | Loss --> 0.651 | Grad_l2 --> 0.159 | Weights_l2 --> 47144.256 | Lr --> 0.000 | Seconds_per_step --> 1.710 | [2024-08-07 14:28:09,838][Main][INFO] - [train] Step 20450 out of 25000 | Loss --> 0.667 | Grad_l2 --> 0.268 | Weights_l2 --> 47144.123 | Lr --> 0.000 | Seconds_per_step --> 1.759 | [2024-08-07 14:29:35,617][Main][INFO] - [train] Step 20500 out of 25000 | Loss --> 0.660 | Grad_l2 --> 0.167 | Weights_l2 --> 47143.985 | Lr --> 0.000 | Seconds_per_step --> 1.716 | [2024-08-07 14:31:08,507][Main][INFO] - [train] Step 20550 out of 25000 | Loss --> 0.660 | Grad_l2 --> 0.154 | Weights_l2 --> 47143.848 | Lr --> 0.000 | Seconds_per_step --> 1.858 | [2024-08-07 14:32:37,146][Main][INFO] - [train] Step 20600 out of 25000 | Loss --> 0.667 | Grad_l2 --> 0.173 | Weights_l2 --> 47143.711 | Lr --> 0.000 | Seconds_per_step --> 1.773 | [2024-08-07 14:34:02,981][Main][INFO] - [train] Step 20650 out of 25000 | Loss --> 0.663 | Grad_l2 --> 0.157 | Weights_l2 --> 47143.573 | Lr --> 0.000 | Seconds_per_step --> 1.717 | [2024-08-07 14:35:28,537][Main][INFO] - [train] Step 20700 out of 25000 | Loss --> 0.641 | Grad_l2 --> 0.158 | Weights_l2 --> 47143.440 | Lr --> 0.000 | Seconds_per_step --> 1.711 | [2024-08-07 14:36:56,530][Main][INFO] - [train] Step 20750 out of 25000 | Loss --> 0.623 | Grad_l2 --> 0.185 | Weights_l2 --> 47143.303 | Lr --> 0.000 | Seconds_per_step --> 1.760 | [2024-08-07 14:38:22,846][Main][INFO] - [train] Step 20800 out of 25000 | Loss --> 0.636 | Grad_l2 --> 0.152 | Weights_l2 --> 47143.162 | Lr --> 0.000 | Seconds_per_step --> 1.726 | [2024-08-07 14:39:48,731][Main][INFO] - [train] Step 20850 out of 25000 | Loss --> 0.630 | Grad_l2 --> 0.153 | Weights_l2 --> 47143.024 | Lr --> 0.000 | Seconds_per_step --> 1.718 | [2024-08-07 14:41:16,867][Main][INFO] - [train] Step 20900 out of 25000 | Loss --> 0.617 | Grad_l2 --> 0.353 | Weights_l2 --> 47142.887 | Lr --> 0.000 | Seconds_per_step --> 1.763 | [2024-08-07 14:42:43,044][Main][INFO] - [train] Step 20950 out of 25000 | Loss --> 0.607 | Grad_l2 --> 0.154 | Weights_l2 --> 47142.750 | Lr --> 0.000 | Seconds_per_step --> 1.724 | [2024-08-07 14:44:11,795][Main][INFO] - [train] Step 21000 out of 25000 | Loss --> 0.622 | Grad_l2 --> 0.153 | Weights_l2 --> 47142.612 | Lr --> 0.000 | Seconds_per_step --> 1.775 | [2024-08-07 14:45:37,402][Main][INFO] - [train] Step 21050 out of 25000 | Loss --> 0.620 | Grad_l2 --> 0.165 | Weights_l2 --> 47142.475 | Lr --> 0.000 | Seconds_per_step --> 1.712 | [2024-08-07 14:47:03,753][Main][INFO] - [train] Step 21100 out of 25000 | Loss --> 0.590 | Grad_l2 --> 0.157 | Weights_l2 --> 47142.341 | Lr --> 0.000 | Seconds_per_step --> 1.727 | [2024-08-07 14:48:32,029][Main][INFO] - [train] Step 21150 out of 25000 | Loss --> 0.610 | Grad_l2 --> 0.151 | Weights_l2 --> 47142.204 | Lr --> 0.000 | Seconds_per_step --> 1.766 | [2024-08-07 14:49:57,716][Main][INFO] - [train] Step 21200 out of 25000 | Loss --> 0.604 | Grad_l2 --> 0.191 | Weights_l2 --> 47142.067 | Lr --> 0.000 | Seconds_per_step --> 1.714 | [2024-08-07 14:51:23,943][Main][INFO] - [train] Step 21250 out of 25000 | Loss --> 0.591 | Grad_l2 --> 0.151 | Weights_l2 --> 47141.930 | Lr --> 0.000 | Seconds_per_step --> 1.725 | [2024-08-07 14:52:51,971][Main][INFO] - [train] Step 21300 out of 25000 | Loss --> 0.583 | Grad_l2 --> 0.156 | Weights_l2 --> 47141.792 | Lr --> 0.000 | Seconds_per_step --> 1.761 | [2024-08-07 14:54:17,847][Main][INFO] - [train] Step 21350 out of 25000 | Loss --> 0.571 | Grad_l2 --> 0.143 | Weights_l2 --> 47141.655 | Lr --> 0.000 | Seconds_per_step --> 1.718 | [2024-08-07 14:55:44,079][Main][INFO] - [train] Step 21400 out of 25000 | Loss --> 0.590 | Grad_l2 --> 0.154 | Weights_l2 --> 47141.521 | Lr --> 0.000 | Seconds_per_step --> 1.725 | [2024-08-07 14:57:13,087][Main][INFO] - [train] Step 21450 out of 25000 | Loss --> 0.573 | Grad_l2 --> 0.205 | Weights_l2 --> 47141.380 | Lr --> 0.000 | Seconds_per_step --> 1.780 | [2024-08-07 14:58:39,380][Main][INFO] - [train] Step 21500 out of 25000 | Loss --> 0.576 | Grad_l2 --> 0.173 | Weights_l2 --> 47141.247 | Lr --> 0.000 | Seconds_per_step --> 1.726 | [2024-08-07 15:00:05,026][Main][INFO] - [train] Step 21550 out of 25000 | Loss --> 0.588 | Grad_l2 --> 0.161 | Weights_l2 --> 47141.109 | Lr --> 0.000 | Seconds_per_step --> 1.713 | [2024-08-07 15:01:33,627][Main][INFO] - [train] Step 21600 out of 25000 | Loss --> 0.572 | Grad_l2 --> 0.143 | Weights_l2 --> 47140.972 | Lr --> 0.000 | Seconds_per_step --> 1.772 | [2024-08-07 15:02:59,931][Main][INFO] - [train] Step 21650 out of 25000 | Loss --> 0.550 | Grad_l2 --> 0.162 | Weights_l2 --> 47140.835 | Lr --> 0.000 | Seconds_per_step --> 1.726 | [2024-08-07 15:04:25,989][Main][INFO] - [train] Step 21700 out of 25000 | Loss --> 0.546 | Grad_l2 --> 0.141 | Weights_l2 --> 47140.697 | Lr --> 0.000 | Seconds_per_step --> 1.721 | [2024-08-07 15:05:54,653][Main][INFO] - [train] Step 21750 out of 25000 | Loss --> 0.563 | Grad_l2 --> 0.190 | Weights_l2 --> 47140.560 | Lr --> 0.000 | Seconds_per_step --> 1.773 | [2024-08-07 15:07:20,787][Main][INFO] - [train] Step 21800 out of 25000 | Loss --> 0.544 | Grad_l2 --> 0.146 | Weights_l2 --> 47140.423 | Lr --> 0.000 | Seconds_per_step --> 1.723 | [2024-08-07 15:08:46,493][Main][INFO] - [train] Step 21850 out of 25000 | Loss --> 0.530 | Grad_l2 --> 0.152 | Weights_l2 --> 47140.285 | Lr --> 0.000 | Seconds_per_step --> 1.714 | [2024-08-07 15:10:14,807][Main][INFO] - [train] Step 21900 out of 25000 | Loss --> 0.550 | Grad_l2 --> 0.141 | Weights_l2 --> 47140.148 | Lr --> 0.000 | Seconds_per_step --> 1.766 | [2024-08-07 15:11:41,045][Main][INFO] - [train] Step 21950 out of 25000 | Loss --> 0.583 | Grad_l2 --> 0.156 | Weights_l2 --> 47140.011 | Lr --> 0.000 | Seconds_per_step --> 1.725 | [2024-08-07 15:13:07,027][Main][INFO] - [train] Step 22000 out of 25000 | Loss --> 0.571 | Grad_l2 --> 0.173 | Weights_l2 --> 47139.873 | Lr --> 0.000 | Seconds_per_step --> 1.720 | [2024-08-07 15:14:35,682][Main][INFO] - [train] Step 22050 out of 25000 | Loss --> 0.557 | Grad_l2 --> 0.141 | Weights_l2 --> 47139.736 | Lr --> 0.000 | Seconds_per_step --> 1.773 | [2024-08-07 15:16:01,234][Main][INFO] - [train] Step 22100 out of 25000 | Loss --> 0.585 | Grad_l2 --> 0.152 | Weights_l2 --> 47139.599 | Lr --> 0.000 | Seconds_per_step --> 1.711 | [2024-08-07 15:17:27,093][Main][INFO] - [train] Step 22150 out of 25000 | Loss --> 0.575 | Grad_l2 --> 0.177 | Weights_l2 --> 47139.461 | Lr --> 0.000 | Seconds_per_step --> 1.717 | [2024-08-07 15:18:55,890][Main][INFO] - [train] Step 22200 out of 25000 | Loss --> 0.571 | Grad_l2 --> 0.152 | Weights_l2 --> 47139.324 | Lr --> 0.000 | Seconds_per_step --> 1.776 | [2024-08-07 15:20:22,205][Main][INFO] - [train] Step 22250 out of 25000 | Loss --> 0.547 | Grad_l2 --> 0.148 | Weights_l2 --> 47139.187 | Lr --> 0.000 | Seconds_per_step --> 1.726 | [2024-08-07 15:21:48,125][Main][INFO] - [train] Step 22300 out of 25000 | Loss --> 0.526 | Grad_l2 --> 0.159 | Weights_l2 --> 47139.049 | Lr --> 0.000 | Seconds_per_step --> 1.718 | [2024-08-07 15:23:16,919][Main][INFO] - [train] Step 22350 out of 25000 | Loss --> 0.544 | Grad_l2 --> 0.136 | Weights_l2 --> 47138.912 | Lr --> 0.000 | Seconds_per_step --> 1.776 | [2024-08-07 15:24:42,706][Main][INFO] - [train] Step 22400 out of 25000 | Loss --> 0.523 | Grad_l2 --> 0.149 | Weights_l2 --> 47138.779 | Lr --> 0.000 | Seconds_per_step --> 1.716 | [2024-08-07 15:26:08,493][Main][INFO] - [train] Step 22450 out of 25000 | Loss --> 0.532 | Grad_l2 --> 0.150 | Weights_l2 --> 47138.641 | Lr --> 0.000 | Seconds_per_step --> 1.716 | [2024-08-07 15:27:37,183][Main][INFO] - [train] Step 22500 out of 25000 | Loss --> 0.551 | Grad_l2 --> 0.138 | Weights_l2 --> 47138.504 | Lr --> 0.000 | Seconds_per_step --> 1.774 | [2024-08-07 15:29:03,362][Main][INFO] - [train] Step 22550 out of 25000 | Loss --> 0.534 | Grad_l2 --> 0.140 | Weights_l2 --> 47138.367 | Lr --> 0.000 | Seconds_per_step --> 1.724 | [2024-08-07 15:30:29,354][Main][INFO] - [train] Step 22600 out of 25000 | Loss --> 0.518 | Grad_l2 --> 0.151 | Weights_l2 --> 47138.229 | Lr --> 0.000 | Seconds_per_step --> 1.720 | [2024-08-07 15:31:58,038][Main][INFO] - [train] Step 22650 out of 25000 | Loss --> 0.507 | Grad_l2 --> 0.147 | Weights_l2 --> 47138.092 | Lr --> 0.000 | Seconds_per_step --> 1.774 | [2024-08-07 15:33:24,401][Main][INFO] - [train] Step 22700 out of 25000 | Loss --> 0.533 | Grad_l2 --> 0.153 | Weights_l2 --> 47137.955 | Lr --> 0.000 | Seconds_per_step --> 1.727 | [2024-08-07 15:34:53,188][Main][INFO] - [train] Step 22750 out of 25000 | Loss --> 0.550 | Grad_l2 --> 0.174 | Weights_l2 --> 47137.821 | Lr --> 0.000 | Seconds_per_step --> 1.776 | [2024-08-07 15:36:19,308][Main][INFO] - [train] Step 22800 out of 25000 | Loss --> 0.537 | Grad_l2 --> 0.166 | Weights_l2 --> 47137.684 | Lr --> 0.000 | Seconds_per_step --> 1.722 | [2024-08-07 15:37:46,010][Main][INFO] - [train] Step 22850 out of 25000 | Loss --> 0.514 | Grad_l2 --> 0.134 | Weights_l2 --> 47137.547 | Lr --> 0.000 | Seconds_per_step --> 1.734 | [2024-08-07 15:39:14,940][Main][INFO] - [train] Step 22900 out of 25000 | Loss --> 0.520 | Grad_l2 --> 0.151 | Weights_l2 --> 47137.409 | Lr --> 0.000 | Seconds_per_step --> 1.779 | [2024-08-07 15:40:41,381][Main][INFO] - [train] Step 22950 out of 25000 | Loss --> 0.525 | Grad_l2 --> 0.145 | Weights_l2 --> 47137.272 | Lr --> 0.000 | Seconds_per_step --> 1.729 | [2024-08-07 15:42:07,321][Main][INFO] - [train] Step 23000 out of 25000 | Loss --> 0.495 | Grad_l2 --> 0.174 | Weights_l2 --> 47137.135 | Lr --> 0.000 | Seconds_per_step --> 1.719 | [2024-08-07 15:43:36,566][Main][INFO] - [train] Step 23050 out of 25000 | Loss --> 0.509 | Grad_l2 --> 0.164 | Weights_l2 --> 47137.001 | Lr --> 0.000 | Seconds_per_step --> 1.785 | [2024-08-07 15:45:02,598][Main][INFO] - [train] Step 23100 out of 25000 | Loss --> 0.526 | Grad_l2 --> 0.144 | Weights_l2 --> 47136.860 | Lr --> 0.000 | Seconds_per_step --> 1.721 | [2024-08-07 15:46:28,802][Main][INFO] - [train] Step 23150 out of 25000 | Loss --> 0.503 | Grad_l2 --> 0.162 | Weights_l2 --> 47136.723 | Lr --> 0.000 | Seconds_per_step --> 1.724 | [2024-08-07 15:47:57,263][Main][INFO] - [train] Step 23200 out of 25000 | Loss --> 0.534 | Grad_l2 --> 0.158 | Weights_l2 --> 47136.585 | Lr --> 0.000 | Seconds_per_step --> 1.769 | [2024-08-07 15:49:23,486][Main][INFO] - [train] Step 23250 out of 25000 | Loss --> 0.482 | Grad_l2 --> 0.129 | Weights_l2 --> 47136.448 | Lr --> 0.000 | Seconds_per_step --> 1.724 | [2024-08-07 15:50:49,406][Main][INFO] - [train] Step 23300 out of 25000 | Loss --> 0.522 | Grad_l2 --> 0.144 | Weights_l2 --> 47136.315 | Lr --> 0.000 | Seconds_per_step --> 1.718 | [2024-08-07 15:52:17,839][Main][INFO] - [train] Step 23350 out of 25000 | Loss --> 0.509 | Grad_l2 --> 0.162 | Weights_l2 --> 47136.177 | Lr --> 0.000 | Seconds_per_step --> 1.769 | [2024-08-07 15:53:43,781][Main][INFO] - [train] Step 23400 out of 25000 | Loss --> 0.507 | Grad_l2 --> 0.168 | Weights_l2 --> 47136.036 | Lr --> 0.000 | Seconds_per_step --> 1.719 | [2024-08-07 15:55:09,882][Main][INFO] - [train] Step 23450 out of 25000 | Loss --> 0.505 | Grad_l2 --> 0.152 | Weights_l2 --> 47135.903 | Lr --> 0.000 | Seconds_per_step --> 1.722 | [2024-08-07 15:56:38,610][Main][INFO] - [train] Step 23500 out of 25000 | Loss --> 0.491 | Grad_l2 --> 0.245 | Weights_l2 --> 47135.765 | Lr --> 0.000 | Seconds_per_step --> 1.775 | [2024-08-07 15:58:04,564][Main][INFO] - [train] Step 23550 out of 25000 | Loss --> 0.515 | Grad_l2 --> 0.170 | Weights_l2 --> 47135.628 | Lr --> 0.000 | Seconds_per_step --> 1.719 | [2024-08-07 15:59:30,685][Main][INFO] - [train] Step 23600 out of 25000 | Loss --> 0.523 | Grad_l2 --> 0.152 | Weights_l2 --> 47135.491 | Lr --> 0.000 | Seconds_per_step --> 1.722 | [2024-08-07 16:00:59,511][Main][INFO] - [train] Step 23650 out of 25000 | Loss --> 0.490 | Grad_l2 --> 0.158 | Weights_l2 --> 47135.353 | Lr --> 0.000 | Seconds_per_step --> 1.777 | [2024-08-07 16:02:25,509][Main][INFO] - [train] Step 23700 out of 25000 | Loss --> 0.496 | Grad_l2 --> 0.129 | Weights_l2 --> 47135.216 | Lr --> 0.000 | Seconds_per_step --> 1.720 | [2024-08-07 16:03:51,651][Main][INFO] - [train] Step 23750 out of 25000 | Loss --> 0.498 | Grad_l2 --> 0.172 | Weights_l2 --> 47135.079 | Lr --> 0.000 | Seconds_per_step --> 1.723 | [2024-08-07 16:05:20,071][Main][INFO] - [train] Step 23800 out of 25000 | Loss --> 0.504 | Grad_l2 --> 0.142 | Weights_l2 --> 47134.942 | Lr --> 0.000 | Seconds_per_step --> 1.768 | [2024-08-07 16:06:46,001][Main][INFO] - [train] Step 23850 out of 25000 | Loss --> 0.502 | Grad_l2 --> 0.134 | Weights_l2 --> 47134.804 | Lr --> 0.000 | Seconds_per_step --> 1.719 | [2024-08-07 16:08:12,045][Main][INFO] - [train] Step 23900 out of 25000 | Loss --> 0.530 | Grad_l2 --> 0.156 | Weights_l2 --> 47134.667 | Lr --> 0.000 | Seconds_per_step --> 1.721 | [2024-08-07 16:09:40,623][Main][INFO] - [train] Step 23950 out of 25000 | Loss --> 0.510 | Grad_l2 --> 0.145 | Weights_l2 --> 47134.530 | Lr --> 0.000 | Seconds_per_step --> 1.772 | [2024-08-07 16:11:06,712][Main][INFO] - [train] Step 24000 out of 25000 | Loss --> 0.507 | Grad_l2 --> 0.138 | Weights_l2 --> 47134.392 | Lr --> 0.000 | Seconds_per_step --> 1.722 | [2024-08-07 16:11:11,695][Main][INFO] - [eval] Step 24000 out of 25000 | Loss --> 0.797 | Accuracy --> 0.851 | Time --> 4.979 | [2024-08-07 16:15:41,940][absl][INFO] - Using default tokenizer. [2024-08-07 16:15:42,533][Main][INFO] - [test] Step 24000 out of 25000 | Rougel --> 25.563 | Time --> 270.838 | [2024-08-07 16:17:08,748][Main][INFO] - [train] Step 24050 out of 25000 | Loss --> 0.479 | Grad_l2 --> 0.139 | Weights_l2 --> 47134.255 | Lr --> 0.000 | Seconds_per_step --> 1.724 | [2024-08-07 16:18:37,470][Main][INFO] - [train] Step 24100 out of 25000 | Loss --> 0.517 | Grad_l2 --> 0.137 | Weights_l2 --> 47134.118 | Lr --> 0.000 | Seconds_per_step --> 1.774 | [2024-08-07 16:20:03,822][Main][INFO] - [train] Step 24150 out of 25000 | Loss --> 0.483 | Grad_l2 --> 0.129 | Weights_l2 --> 47133.984 | Lr --> 0.000 | Seconds_per_step --> 1.727 | [2024-08-07 16:21:30,048][Main][INFO] - [train] Step 24200 out of 25000 | Loss --> 0.491 | Grad_l2 --> 0.134 | Weights_l2 --> 47133.847 | Lr --> 0.000 | Seconds_per_step --> 1.725 | [2024-08-07 16:22:59,173][Main][INFO] - [train] Step 24250 out of 25000 | Loss --> 0.488 | Grad_l2 --> 0.136 | Weights_l2 --> 47133.709 | Lr --> 0.000 | Seconds_per_step --> 1.782 | [2024-08-07 16:24:25,725][Main][INFO] - [train] Step 24300 out of 25000 | Loss --> 0.499 | Grad_l2 --> 0.139 | Weights_l2 --> 47133.572 | Lr --> 0.000 | Seconds_per_step --> 1.731 | [2024-08-07 16:25:52,236][Main][INFO] - [train] Step 24350 out of 25000 | Loss --> 0.492 | Grad_l2 --> 0.156 | Weights_l2 --> 47133.435 | Lr --> 0.000 | Seconds_per_step --> 1.730 | [2024-08-07 16:27:21,162][Main][INFO] - [train] Step 24400 out of 25000 | Loss --> 0.521 | Grad_l2 --> 0.151 | Weights_l2 --> 47133.297 | Lr --> 0.000 | Seconds_per_step --> 1.779 | [2024-08-07 16:28:47,562][Main][INFO] - [train] Step 24450 out of 25000 | Loss --> 0.491 | Grad_l2 --> 0.159 | Weights_l2 --> 47133.164 | Lr --> 0.000 | Seconds_per_step --> 1.728 | [2024-08-07 16:30:22,050][Main][INFO] - [train] Step 24500 out of 25000 | Loss --> 0.484 | Grad_l2 --> 0.135 | Weights_l2 --> 47133.023 | Lr --> 0.000 | Seconds_per_step --> 1.890 | [2024-08-07 16:31:51,066][Main][INFO] - [train] Step 24550 out of 25000 | Loss --> 0.479 | Grad_l2 --> 0.134 | Weights_l2 --> 47132.889 | Lr --> 0.000 | Seconds_per_step --> 1.780 | [2024-08-07 16:33:17,643][Main][INFO] - [train] Step 24600 out of 25000 | Loss --> 0.489 | Grad_l2 --> 0.135 | Weights_l2 --> 47132.752 | Lr --> 0.000 | Seconds_per_step --> 1.732 | [2024-08-07 16:34:44,019][Main][INFO] - [train] Step 24650 out of 25000 | Loss --> 0.507 | Grad_l2 --> 0.133 | Weights_l2 --> 47132.615 | Lr --> 0.000 | Seconds_per_step --> 1.727 | [2024-08-07 16:36:12,997][Main][INFO] - [train] Step 24700 out of 25000 | Loss --> 0.488 | Grad_l2 --> 0.130 | Weights_l2 --> 47132.477 | Lr --> 0.000 | Seconds_per_step --> 1.780 | [2024-08-07 16:37:39,530][Main][INFO] - [train] Step 24750 out of 25000 | Loss --> 0.469 | Grad_l2 --> 0.131 | Weights_l2 --> 47132.340 | Lr --> 0.000 | Seconds_per_step --> 1.731 | [2024-08-07 16:39:05,855][Main][INFO] - [train] Step 24800 out of 25000 | Loss --> 0.504 | Grad_l2 --> 0.188 | Weights_l2 --> 47132.206 | Lr --> 0.000 | Seconds_per_step --> 1.726 | [2024-08-07 16:40:34,490][Main][INFO] - [train] Step 24850 out of 25000 | Loss --> 0.483 | Grad_l2 --> 0.132 | Weights_l2 --> 47132.065 | Lr --> 0.000 | Seconds_per_step --> 1.773 | [2024-08-07 16:42:00,929][Main][INFO] - [train] Step 24900 out of 25000 | Loss --> 0.499 | Grad_l2 --> 0.155 | Weights_l2 --> 47131.932 | Lr --> 0.000 | Seconds_per_step --> 1.729 | [2024-08-07 16:43:27,315][Main][INFO] - [train] Step 24950 out of 25000 | Loss --> 0.492 | Grad_l2 --> 0.138 | Weights_l2 --> 47131.794 | Lr --> 0.000 | Seconds_per_step --> 1.728 | [2024-08-07 16:44:55,907][Main][INFO] - [train] Step 25000 out of 25000 | Loss --> 0.469 | Grad_l2 --> 0.137 | Weights_l2 --> 47131.657 | Lr --> 0.000 | Seconds_per_step --> 1.772 | [2024-08-07 16:44:55,907][accelerate.accelerator][INFO] - Saving current state to checkpoint-ft-25000 [2024-08-07 16:44:55,913][accelerate.utils.other][WARNING] - Removed shared tensor {'encoder.embed_tokens.weight', 'lm_head.weight', 'decoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading [2024-08-07 16:44:56,709][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-ft-25000/model.safetensors [2024-08-07 16:44:57,851][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-ft-25000/optimizer.bin [2024-08-07 16:44:57,851][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-ft-25000/scheduler.bin [2024-08-07 16:44:57,851][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-ft-25000/sampler.bin [2024-08-07 16:44:57,851][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-ft-25000/sampler_1.bin [2024-08-07 16:44:57,852][accelerate.checkpointing][INFO] - Random states saved in checkpoint-ft-25000/random_states_0.pkl [2024-08-07 16:45:03,505][Main][INFO] - [eval] Step 25001 out of 25000 | Loss --> 0.794 | Accuracy --> 0.851 | Time --> 5.006 | [2024-08-07 16:49:32,893][absl][INFO] - Using default tokenizer. [2024-08-07 16:49:33,496][Main][INFO] - [test] Step 25001 out of 25000 | Rougel --> 25.834 | Time --> 269.990 | [2024-08-07 16:49:33,501][accelerate.accelerator][INFO] - Saving current state to checkpoint-ft-25001 [2024-08-07 16:49:33,508][accelerate.utils.other][WARNING] - Removed shared tensor {'encoder.embed_tokens.weight', 'lm_head.weight', 'decoder.embed_tokens.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading [2024-08-07 16:49:34,345][accelerate.checkpointing][INFO] - Model weights saved in checkpoint-ft-25001/model.safetensors [2024-08-07 16:49:35,498][accelerate.checkpointing][INFO] - Optimizer state saved in checkpoint-ft-25001/optimizer.bin [2024-08-07 16:49:35,499][accelerate.checkpointing][INFO] - Scheduler state saved in checkpoint-ft-25001/scheduler.bin [2024-08-07 16:49:35,499][accelerate.checkpointing][INFO] - Sampler state for dataloader 0 saved in checkpoint-ft-25001/sampler.bin [2024-08-07 16:49:35,499][accelerate.checkpointing][INFO] - Sampler state for dataloader 1 saved in checkpoint-ft-25001/sampler_1.bin [2024-08-07 16:49:35,500][accelerate.checkpointing][INFO] - Random states saved in checkpoint-ft-25001/random_states_0.pkl