2023-03-27 21:39:09.060122: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`. | |
-------------------------------------------------------------------------- | |
WARNING: No preset parameters were found for the device that Open MPI | |
detected: | |
Local host: 192-9-155-93 | |
Device name: mlx5_0 | |
Device vendor ID: 0x02c9 | |
Device vendor part ID: 4126 | |
Default device parameters will be used, which may result in lower | |
performance. You can edit any of the files specified by the | |
btl_openib_device_param_files MCA parameter to set values for your | |
device. | |
NOTE: You can turn off this warning by setting the MCA parameter | |
btl_openib_warn_no_device_params_found to 0. | |
-------------------------------------------------------------------------- | |
-------------------------------------------------------------------------- | |
No OpenFabrics connection schemes reported that they were able to be | |
used on a specific port. As such, the openib BTL (OpenFabrics | |
support) will be disabled for this port. | |
Local host: 192-9-155-93 | |
Local device: mlx5_0 | |
Local port: 1 | |
CPCs attempted: udcm | |
-------------------------------------------------------------------------- | |
/home/ubuntu/.local/lib/python3.8/site-packages/pandas/core/computation/expressions.py:20: UserWarning: Pandas requires version '2.7.3' or newer of 'numexpr' (version '2.7.1' currently installed). | |
from pandas.core.computation.check import NUMEXPR_INSTALLED | |
/home/ubuntu/.local/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/cuda/lib64')} | |
warn(msg) | |
/home/ubuntu/.local/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: No libcudart.so found! Install CUDA or the cudatoolkit package (anaconda)! | |
warn(msg) | |
Overriding torch_dtype=None with `torch_dtype=torch.float16` due to requirements of `bitsandbytes` to enable model loading in mixed int8. Either pass torch_dtype=torch.float16 or don't pass this argument at all to remove this warning. | |
===================================BUG REPORT=================================== | |
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues | |
================================================================================ | |
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64... | |
CUDA SETUP: Highest compute capability among GPUs detected: 8.6 | |
CUDA SETUP: Detected CUDA version 117 | |
CUDA SETUP: Loading binary /home/ubuntu/.local/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cpu.so... | |
Training Alpaca-LoRA model with params: | |
base_model: /home/ubuntu/models/llama_7B/ | |
data_path: ./alpaca_data_cleaned.json | |
output_dir: ./lora-alpaca | |
batch_size: 128 | |
micro_batch_size: 32 | |
num_epochs: 1 | |
learning_rate: 0.0003 | |
cutoff_len: 256 | |
val_set_size: 2000 | |
lora_r: 8 | |
lora_alpha: 16 | |
lora_dropout: 0.05 | |
lora_target_modules: ['q_proj', 'v_proj'] | |
train_on_inputs: True | |
group_by_length: False | |
resume_from_checkpoint: None | |
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s] Loading checkpoint shards: 50%|βββββ | 1/2 [00:05<00:05, 5.18s/it] Loading checkpoint shards: 100%|ββββββββββ| 2/2 [00:07<00:00, 3.25s/it] Loading checkpoint shards: 100%|ββββββββββ| 2/2 [00:07<00:00, 3.54s/it] | |
Found cached dataset json (/home/ubuntu/.cache/huggingface/datasets/json/default-0b8c16dfa3eac6d6/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51) | |
0%| | 0/1 [00:00<?, ?it/s] 100%|ββββββββββ| 1/1 [00:00<00:00, 690.88it/s] | |
Loading cached split indices for dataset at /home/ubuntu/.cache/huggingface/datasets/json/default-0b8c16dfa3eac6d6/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-3d375a2c1633d7cd.arrow and /home/ubuntu/.cache/huggingface/datasets/json/default-0b8c16dfa3eac6d6/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-31adcadc050eed84.arrow | |
trainable params: 4194304 || all params: 6742609920 || trainable%: 0.06220594176090199 | |
Map: 0%| | 0/49759 [00:00<?, ? examples/s] Map: 0%| | 148/49759 [00:00<00:33, 1461.68 examples/s] Map: 1%| | 309/49759 [00:00<00:32, 1534.35 examples/s] Map: 1%| | 533/49759 [00:00<00:32, 1505.02 examples/s] Map: 2%|β | 755/49759 [00:00<00:32, 1492.32 examples/s] Map: 2%|β | 910/49759 [00:00<00:32, 1505.82 examples/s] Map: 2%|β | 1101/49759 [00:00<00:34, 1410.80 examples/s] Map: 3%|β | 1255/49759 [00:00<00:33, 1444.57 examples/s] Map: 3%|β | 1412/49759 [00:00<00:32, 1476.09 examples/s] Map: 3%|β | 1572/49759 [00:01<00:31, 1510.14 examples/s] Map: 3%|β | 1728/49759 [00:01<00:31, 1520.06 examples/s] Map: 4%|β | 1891/49759 [00:01<00:30, 1547.20 examples/s] Map: 4%|β | 2086/49759 [00:01<00:33, 1439.79 examples/s] Map: 5%|β | 2243/49759 [00:01<00:32, 1471.01 examples/s] Map: 5%|β | 2400/49759 [00:01<00:31, 1492.28 examples/s] Map: 5%|β | 2555/49759 [00:01<00:31, 1505.38 examples/s] Map: 6%|β | 2778/49759 [00:01<00:31, 1494.05 examples/s] Map: 6%|β | 2931/49759 [00:01<00:31, 1501.03 examples/s] Map: 6%|β | 3166/49759 [00:02<00:32, 1446.65 examples/s] Map: 7%|β | 3323/49759 [00:02<00:31, 1474.75 examples/s] Map: 7%|β | 3484/49759 [00:02<00:30, 1509.01 examples/s] Map: 7%|β | 3706/49759 [00:02<00:30, 1495.99 examples/s] Map: 8%|β | 3934/49759 [00:02<00:30, 1501.24 examples/s] Map: 8%|β | 4158/49759 [00:02<00:31, 1436.23 examples/s] Map: 9%|β | 4311/49759 [00:02<00:31, 1456.32 examples/s] Map: 9%|β | 4465/49759 [00:03<00:30, 1475.97 examples/s] Map: 9%|β | 4620/49759 [00:03<00:30, 1492.61 examples/s] Map: 10%|β | 4778/49759 [00:03<00:29, 1511.86 examples/s] Map: 10%|β | 5000/49759 [00:03<00:31, 1424.51 examples/s] Map: 10%|β | 5157/49759 [00:03<00:30, 1459.65 examples/s] Map: 11%|β | 5385/49759 [00:03<00:29, 1479.74 examples/s] Map: 11%|β | 5537/49759 [00:03<00:29, 1487.88 examples/s] Map: 12%|ββ | 5763/49759 [00:03<00:29, 1491.79 examples/s] Map: 12%|ββ | 5922/49759 [00:03<00:28, 1512.70 examples/s] Map: 12%|ββ | 6122/49759 [00:04<00:30, 1446.79 examples/s] Map: 13%|ββ | 6279/49759 [00:04<00:29, 1471.25 examples/s] Map: 13%|ββ | 6434/49759 [00:04<00:29, 1488.46 examples/s] Map: 13%|ββ | 6593/49759 [00:04<00:28, 1510.77 examples/s] Map: 14%|ββ | 6748/49759 [00:04<00:28, 1519.12 examples/s] Map: 14%|ββ | 6910/49759 [00:04<00:27, 1544.21 examples/s] Map: 14%|ββ | 7098/49759 [00:04<00:29, 1427.22 examples/s] Map: 15%|ββ | 7256/49759 [00:04<00:29, 1465.16 examples/s] Map: 15%|ββ | 7412/49759 [00:05<00:28, 1484.84 examples/s] Map: 15%|ββ | 7569/49759 [00:05<00:28, 1505.47 examples/s] Map: 16%|ββ | 7724/49759 [00:05<00:27, 1516.19 examples/s] Map: 16%|ββ | 7882/49759 [00:05<00:27, 1528.42 examples/s] Map: 16%|ββ | 8080/49759 [00:05<00:28, 1439.38 examples/s] Map: 17%|ββ | 8235/49759 [00:05<00:28, 1464.57 examples/s] Map: 17%|ββ | 8393/49759 [00:05<00:27, 1491.06 examples/s] Map: 17%|ββ | 8551/49759 [00:05<00:27, 1512.74 examples/s] Map: 18%|ββ | 8711/49759 [00:05<00:26, 1536.39 examples/s] Map: 18%|ββ | 8874/49759 [00:05<00:26, 1560.51 examples/s] Map: 18%|ββ | 9081/49759 [00:06<00:27, 1462.84 examples/s] Map: 19%|ββ | 9240/49759 [00:06<00:27, 1492.93 examples/s] Map: 19%|ββ | 9400/49759 [00:06<00:26, 1518.85 examples/s] Map: 19%|ββ | 9564/49759 [00:06<00:25, 1550.65 examples/s] Map: 20%|ββ | 9727/49759 [00:06<00:25, 1569.37 examples/s] Map: 20%|ββ | 9956/49759 [00:06<00:25, 1552.13 examples/s] Map: 20%|ββ | 10157/49759 [00:06<00:26, 1467.34 examples/s] Map: 21%|ββ | 10314/49759 [00:06<00:26, 1491.89 examples/s] Map: 21%|ββ | 10474/49759 [00:07<00:26, 1509.92 examples/s] Map: 21%|βββ | 10629/49759 [00:07<00:25, 1512.56 examples/s] Map: 22%|βββ | 10786/49759 [00:07<00:25, 1524.66 examples/s] Map: 22%|βββ | 10948/49759 [00:07<00:25, 1547.32 examples/s] Map: 22%|βββ | 11165/49759 [00:07<00:26, 1462.57 examples/s] Map: 23%|βββ | 11323/49759 [00:07<00:25, 1489.68 examples/s] Map: 23%|βββ | 11548/49759 [00:07<00:25, 1489.29 examples/s] Map: 24%|βββ | 11711/49759 [00:07<00:25, 1521.17 examples/s] Map: 24%|βββ | 11870/49759 [00:07<00:24, 1536.59 examples/s] Map: 24%|βββ | 12077/49759 [00:08<00:26, 1441.30 examples/s] Map: 25%|βββ | 12230/49759 [00:08<00:25, 1458.22 examples/s] Map: 25%|βββ | 12394/49759 [00:08<00:24, 1501.59 examples/s] Map: 25%|βββ | 12551/49759 [00:08<00:24, 1517.99 examples/s] Map: 26%|βββ | 12710/49759 [00:08<00:24, 1537.17 examples/s] Map: 26%|βββ | 12876/49759 [00:08<00:23, 1570.87 examples/s] Map: 26%|βββ | 13082/49759 [00:08<00:24, 1475.60 examples/s] Map: 27%|βββ | 13240/49759 [00:08<00:24, 1502.32 examples/s] Map: 27%|βββ | 13397/49759 [00:08<00:23, 1518.67 examples/s] Map: 27%|βββ | 13628/49759 [00:09<00:23, 1522.81 examples/s] Map: 28%|βββ | 13855/49759 [00:09<00:23, 1514.07 examples/s] Map: 28%|βββ | 14087/49759 [00:09<00:24, 1475.46 examples/s] Map: 29%|βββ | 14240/49759 [00:09<00:23, 1487.44 examples/s] Map: 29%|βββ | 14405/49759 [00:09<00:23, 1524.14 examples/s] Map: 29%|βββ | 14560/49759 [00:09<00:23, 1528.00 examples/s] Map: 30%|βββ | 14714/49759 [00:09<00:22, 1529.36 examples/s] Map: 30%|βββ | 14870/49759 [00:09<00:22, 1532.63 examples/s] Map: 30%|βββ | 15084/49759 [00:10<00:23, 1462.28 examples/s] Map: 31%|βββ | 15248/49759 [00:10<00:22, 1508.17 examples/s] Map: 31%|βββ | 15477/49759 [00:10<00:22, 1512.42 examples/s] Map: 31%|ββββ | 15635/49759 [00:10<00:22, 1528.30 examples/s] Map: 32%|ββββ | 15791/49759 [00:10<00:22, 1532.83 examples/s] Map: 32%|ββββ | 16000/49759 [00:10<00:23, 1426.04 examples/s] Map: 32%|ββββ | 16151/49759 [00:10<00:23, 1442.63 examples/s] Map: 33%|ββββ | 16306/49759 [00:10<00:22, 1468.05 examples/s] Map: 33%|ββββ | 16462/49759 [00:11<00:22, 1488.63 examples/s] Map: 33%|ββββ | 16622/49759 [00:11<00:21, 1515.19 examples/s] Map: 34%|ββββ | 16786/49759 [00:11<00:21, 1544.59 examples/s] Map: 34%|ββββ | 17000/49759 [00:11<00:22, 1448.10 examples/s] Map: 34%|ββββ | 17162/49759 [00:11<00:21, 1488.09 examples/s] Map: 35%|ββββ | 17321/49759 [00:11<00:21, 1512.65 examples/s] Map: 35%|ββββ | 17478/49759 [00:11<00:21, 1524.87 examples/s] Map: 36%|ββββ | 17712/49759 [00:11<00:20, 1533.35 examples/s] Map: 36%|ββββ | 17872/49759 [00:11<00:20, 1547.35 examples/s] Map: 36%|ββββ | 18081/49759 [00:12<00:22, 1438.42 examples/s] Map: 37%|ββββ | 18245/49759 [00:12<00:21, 1487.36 examples/s] Map: 37%|ββββ | 18402/49759 [00:12<00:20, 1503.91 examples/s] Map: 37%|ββββ | 18560/49759 [00:12<00:20, 1521.28 examples/s] Map: 38%|ββββ | 18791/49759 [00:12<00:20, 1524.56 examples/s] Map: 38%|ββββ | 19000/49759 [00:12<00:21, 1423.57 examples/s] Map: 38%|ββββ | 19156/49759 [00:12<00:21, 1454.33 examples/s] Map: 39%|ββββ | 19318/49759 [00:12<00:20, 1494.92 examples/s] Map: 39%|ββββ | 19476/49759 [00:13<00:20, 1511.42 examples/s] Map: 39%|ββββ | 19633/49759 [00:13<00:19, 1519.32 examples/s] Map: 40%|ββββ | 19792/49759 [00:13<00:19, 1535.13 examples/s] Map: 40%|ββββ | 19963/49759 [00:13<00:18, 1581.48 examples/s] Map: 41%|ββββ | 20160/49759 [00:13<00:20, 1468.27 examples/s] Map: 41%|ββββ | 20327/49759 [00:13<00:19, 1519.44 examples/s] Map: 41%|ββββ | 20482/49759 [00:13<00:19, 1526.64 examples/s] Map: 42%|βββββ | 20710/49759 [00:13<00:19, 1520.88 examples/s] Map: 42%|βββββ | 20867/49759 [00:13<00:18, 1531.05 examples/s] Map: 42%|βββββ | 21082/49759 [00:14<00:19, 1448.82 examples/s] Map: 43%|βββββ | 21246/49759 [00:14<00:19, 1490.79 examples/s] Map: 43%|βββββ | 21398/49759 [00:14<00:18, 1496.29 examples/s] Map: 43%|βββββ | 21557/49759 [00:14<00:18, 1516.56 examples/s] Map: 44%|βββββ | 21712/49759 [00:14<00:18, 1524.25 examples/s] Map: 44%|βββββ | 21869/49759 [00:14<00:18, 1533.94 examples/s] Map: 44%|βββββ | 22084/49759 [00:14<00:19, 1453.12 examples/s] Map: 45%|βββββ | 22238/49759 [00:14<00:18, 1469.96 examples/s] Map: 45%|βββββ | 22402/49759 [00:14<00:18, 1511.20 examples/s] Map: 45%|βββββ | 22560/49759 [00:15<00:17, 1528.31 examples/s] Map: 46%|βββββ | 22792/49759 [00:15<00:17, 1530.29 examples/s] Map: 46%|βββββ | 22962/49759 [00:15<00:17, 1570.67 examples/s] Map: 47%|βββββ | 23165/49759 [00:15<00:17, 1483.98 examples/s] Map: 47%|βββββ | 23322/49759 [00:15<00:17, 1501.53 examples/s] Map: 47%|βββββ | 23475/49759 [00:15<00:17, 1506.76 examples/s] Map: 47%|βββββ | 23631/49759 [00:15<00:17, 1519.25 examples/s] Map: 48%|βββββ | 23790/49759 [00:15<00:16, 1538.17 examples/s] Map: 48%|βββββ | 23954/49759 [00:15<00:16, 1565.04 examples/s] Map: 49%|βββββ | 24165/49759 [00:16<00:17, 1471.65 examples/s] Map: 49%|βββββ | 24330/49759 [00:16<00:16, 1516.42 examples/s] Map: 49%|βββββ | 24494/49759 [00:16<00:16, 1546.52 examples/s] Map: 50%|βββββ | 24651/49759 [00:16<00:16, 1549.45 examples/s] Map: 50%|βββββ | 24874/49759 [00:16<00:16, 1523.80 examples/s] Map: 50%|βββββ | 25080/49759 [00:16<00:16, 1466.91 examples/s] Map: 51%|βββββ | 25241/49759 [00:16<00:16, 1500.36 examples/s] Map: 51%|βββββ | 25400/49759 [00:16<00:16, 1521.41 examples/s] Map: 51%|ββββββ | 25560/49759 [00:17<00:15, 1541.08 examples/s] Map: 52%|ββββββ | 25785/49759 [00:17<00:15, 1518.90 examples/s] Map: 52%|ββββββ | 25944/49759 [00:17<00:15, 1533.32 examples/s] Map: 53%|ββββββ | 26154/49759 [00:17<00:16, 1454.07 examples/s] Map: 53%|ββββββ | 26303/49759 [00:17<00:16, 1459.50 examples/s] Map: 53%|ββββββ | 26458/49759 [00:17<00:15, 1480.22 examples/s] Map: 53%|ββββββ | 26618/49759 [00:17<00:15, 1507.92 examples/s] Map: 54%|ββββββ | 26774/49759 [00:17<00:15, 1517.36 examples/s] Map: 54%|ββββββ | 26929/49759 [00:17<00:14, 1525.25 examples/s] Map: 55%|ββββββ | 27161/49759 [00:18<00:15, 1445.61 examples/s] Map: 55%|ββββββ | 27312/49759 [00:18<00:15, 1459.39 examples/s] Map: 55%|ββββββ | 27473/49759 [00:18<00:14, 1499.07 examples/s] Map: 56%|ββββββ | 27635/49759 [00:18<00:14, 1524.25 examples/s] Map: 56%|ββββββ | 27799/49759 [00:18<00:14, 1553.08 examples/s] Map: 56%|ββββββ | 28000/49759 [00:18<00:15, 1442.68 examples/s] Map: 57%|ββββββ | 28157/49759 [00:18<00:14, 1471.57 examples/s] Map: 57%|ββββββ | 28311/49759 [00:18<00:14, 1489.07 examples/s] Map: 57%|ββββββ | 28468/49759 [00:18<00:14, 1507.41 examples/s] Map: 58%|ββββββ | 28630/49759 [00:19<00:13, 1538.68 examples/s] Map: 58%|ββββββ | 28863/49759 [00:19<00:13, 1542.81 examples/s] Map: 58%|ββββββ | 29076/49759 [00:19<00:14, 1439.39 examples/s] Map: 59%|ββββββ | 29244/49759 [00:19<00:13, 1496.38 examples/s] Map: 59%|ββββββ | 29406/49759 [00:19<00:13, 1526.94 examples/s] Map: 60%|ββββββ | 29639/49759 [00:19<00:13, 1533.30 examples/s] Map: 60%|ββββββ | 29795/49759 [00:19<00:12, 1537.15 examples/s] Map: 60%|ββββββ | 30000/49759 [00:20<00:13, 1428.10 examples/s] Map: 61%|ββββββ | 30167/49759 [00:20<00:13, 1485.50 examples/s] Map: 61%|ββββββ | 30391/49759 [00:20<00:13, 1486.04 examples/s] Map: 62%|βββββββ | 30608/49759 [00:20<00:13, 1464.26 examples/s] Map: 62%|βββββββ | 30759/49759 [00:20<00:12, 1472.40 examples/s] Map: 62%|βββββββ | 30913/49759 [00:20<00:12, 1487.57 examples/s] Map: 63%|βββββββ | 31107/49759 [00:20<00:13, 1415.93 examples/s] Map: 63%|βββββββ | 31323/49759 [00:20<00:12, 1422.70 examples/s] Map: 63%|βββββββ | 31471/49759 [00:21<00:12, 1435.51 examples/s] Map: 64%|βββββββ | 31630/49759 [00:21<00:12, 1470.62 examples/s] Map: 64%|βββββββ | 31781/49759 [00:21<00:12, 1477.50 examples/s] Map: 64%|βββββββ | 32000/49759 [00:21<00:12, 1383.11 examples/s] Map: 65%|βββββββ | 32151/49759 [00:21<00:12, 1412.82 examples/s] Map: 65%|βββββββ | 32302/49759 [00:21<00:12, 1434.75 examples/s] Map: 65%|βββββββ | 32448/49759 [00:21<00:12, 1438.92 examples/s] Map: 66%|βββββββ | 32597/49759 [00:21<00:11, 1449.08 examples/s] Map: 66%|βββββββ | 32749/49759 [00:21<00:11, 1464.73 examples/s] Map: 66%|βββββββ | 32966/49759 [00:22<00:11, 1454.59 examples/s] Map: 67%|βββββββ | 33144/49759 [00:22<00:12, 1344.38 examples/s] Map: 67%|βββββββ | 33294/49759 [00:22<00:11, 1380.61 examples/s] Map: 67%|βββββββ | 33437/49759 [00:22<00:11, 1392.95 examples/s] Map: 67%|βββββββ | 33581/49759 [00:22<00:11, 1403.74 examples/s] Map: 68%|βββββββ | 33738/49759 [00:22<00:11, 1447.56 examples/s] Map: 68%|βββββββ | 33892/49759 [00:22<00:10, 1470.84 examples/s] Map: 68%|βββββββ | 34068/49759 [00:22<00:11, 1346.81 examples/s] Map: 69%|βββββββ | 34217/49759 [00:22<00:11, 1378.55 examples/s] Map: 69%|βββββββ | 34374/49759 [00:23<00:10, 1428.65 examples/s] Map: 70%|βββββββ | 34595/49759 [00:23<00:10, 1441.12 examples/s] Map: 70%|βββββββ | 34758/49759 [00:23<00:10, 1488.86 examples/s] Map: 70%|βββββββ | 34981/49759 [00:23<00:09, 1485.90 examples/s] Map: 71%|βββββββ | 35172/49759 [00:23<00:10, 1409.65 examples/s] Map: 71%|βββββββ | 35319/49759 [00:23<00:10, 1421.35 examples/s] Map: 71%|ββββββββ | 35471/49759 [00:23<00:09, 1446.35 examples/s] Map: 72%|ββββββββ | 35622/49759 [00:23<00:09, 1460.99 examples/s] Map: 72%|ββββββββ | 35847/49759 [00:24<00:09, 1471.39 examples/s] Map: 72%|ββββββββ | 36000/49759 [00:24<00:09, 1407.99 examples/s] Map: 73%|ββββββββ | 36151/49759 [00:24<00:09, 1432.90 examples/s] Map: 73%|ββββββββ | 36302/49759 [00:24<00:09, 1453.42 examples/s] Map: 73%|ββββββββ | 36450/49759 [00:24<00:09, 1457.46 examples/s] Map: 74%|ββββββββ | 36604/49759 [00:24<00:08, 1476.12 examples/s] Map: 74%|ββββββββ | 36761/49759 [00:24<00:08, 1500.63 examples/s] Map: 74%|ββββββββ | 36981/49759 [00:24<00:08, 1484.58 examples/s] Map: 75%|ββββββββ | 37167/49759 [00:25<00:09, 1392.53 examples/s] Map: 75%|ββββββββ | 37310/49759 [00:25<00:08, 1400.74 examples/s] Map: 75%|ββββββββ | 37462/49759 [00:25<00:08, 1430.42 examples/s] Map: 76%|ββββββββ | 37613/49759 [00:25<00:08, 1448.34 examples/s] Map: 76%|ββββββββ | 37765/49759 [00:25<00:08, 1465.88 examples/s] Map: 76%|ββββββββ | 37919/49759 [00:25<00:07, 1485.05 examples/s] Map: 77%|ββββββββ | 38102/49759 [00:25<00:08, 1379.56 examples/s] Map: 77%|ββββββββ | 38255/49759 [00:25<00:08, 1416.78 examples/s] Map: 77%|ββββββββ | 38476/49759 [00:25<00:07, 1436.27 examples/s] Map: 78%|ββββββββ | 38626/49759 [00:26<00:07, 1450.09 examples/s] Map: 78%|ββββββββ | 38773/49759 [00:26<00:07, 1452.91 examples/s] Map: 78%|ββββββββ | 38932/49759 [00:26<00:07, 1489.86 examples/s] Map: 79%|ββββββββ | 39085/49759 [00:26<00:07, 1413.04 examples/s] Map: 79%|ββββββββ | 39239/49759 [00:26<00:07, 1442.47 examples/s] Map: 79%|ββββββββ | 39393/49759 [00:26<00:07, 1467.62 examples/s] Map: 79%|ββββββββ | 39553/49759 [00:26<00:06, 1503.47 examples/s] Map: 80%|ββββββββ | 39709/49759 [00:26<00:06, 1517.76 examples/s] Map: 80%|ββββββββ | 39947/49759 [00:26<00:06, 1540.18 examples/s] Map: 81%|ββββββββ | 40156/49759 [00:27<00:06, 1429.01 examples/s] Map: 81%|ββββββββ | 40319/49759 [00:27<00:06, 1476.75 examples/s] Map: 81%|βββββββββ | 40475/49759 [00:27<00:06, 1496.48 examples/s] Map: 82%|βββββββββ | 40647/49759 [00:27<00:05, 1554.45 examples/s] Map: 82%|βββββββββ | 40877/49759 [00:27<00:05, 1544.46 examples/s] Map: 83%|βββββββββ | 41076/49759 [00:27<00:05, 1455.04 examples/s] Map: 83%|βββββββββ | 41234/49759 [00:27<00:05, 1480.30 examples/s] Map: 83%|βββββββββ | 41396/49759 [00:27<00:05, 1515.94 examples/s] Map: 84%|βββββββββ | 41553/49759 [00:27<00:05, 1529.05 examples/s] Map: 84%|βββββββββ | 41713/49759 [00:28<00:05, 1545.42 examples/s] Map: 84%|βββββββββ | 41880/49759 [00:28<00:04, 1577.18 examples/s] Map: 85%|βββββββββ | 42082/49759 [00:28<00:05, 1475.54 examples/s] Map: 85%|βββββββββ | 42247/49759 [00:28<00:04, 1518.74 examples/s] Map: 85%|βββββββββ | 42403/49759 [00:28<00:04, 1528.72 examples/s] Map: 86%|βββββββββ | 42563/49759 [00:28<00:04, 1542.81 examples/s] Map: 86%|βββββββββ | 42722/49759 [00:28<00:04, 1553.78 examples/s] Map: 86%|βββββββββ | 42880/49759 [00:28<00:04, 1552.14 examples/s] Map: 87%|βββββββββ | 43078/49759 [00:28<00:04, 1431.76 examples/s] Map: 87%|βββββββββ | 43243/49759 [00:29<00:04, 1485.90 examples/s] Map: 87%|βββββββββ | 43403/49759 [00:29<00:04, 1515.88 examples/s] Map: 88%|βββββββββ | 43563/49759 [00:29<00:04, 1535.19 examples/s] Map: 88%|βββββββββ | 43719/49759 [00:29<00:03, 1540.39 examples/s] Map: 88%|βββββββββ | 43944/49759 [00:29<00:03, 1523.30 examples/s] Map: 89%|βββββββββ | 44157/49759 [00:29<00:03, 1442.43 examples/s] Map: 89%|βββββββββ | 44307/49759 [00:29<00:03, 1454.36 examples/s] Map: 89%|βββββββββ | 44456/49759 [00:29<00:03, 1461.10 examples/s] Map: 90%|βββββββββ | 44607/49759 [00:30<00:03, 1468.74 examples/s] Map: 90%|βββββββββ | 44766/49759 [00:30<00:03, 1498.89 examples/s] Map: 90%|βββββββββ | 44920/49759 [00:30<00:03, 1509.43 examples/s] Map: 91%|βββββββββ | 45075/49759 [00:30<00:03, 1406.22 examples/s] Map: 91%|βββββββββ | 45232/49759 [00:30<00:03, 1449.74 examples/s] Map: 91%|βββββββββ | 45388/49759 [00:30<00:02, 1479.69 examples/s] Map: 92%|ββββββββββ| 45541/49759 [00:30<00:02, 1492.66 examples/s] Map: 92%|ββββββββββ| 45692/49759 [00:30<00:02, 1495.27 examples/s] Map: 92%|ββββββββββ| 45847/49759 [00:30<00:02, 1509.51 examples/s] Map: 92%|ββββββββββ| 46000/49759 [00:30<00:02, 1415.19 examples/s] Map: 93%|ββββββββββ| 46152/49759 [00:31<00:02, 1443.42 examples/s] Map: 93%|ββββββββββ| 46305/49759 [00:31<00:02, 1465.45 examples/s] Map: 93%|ββββββββββ| 46455/49759 [00:31<00:02, 1474.03 examples/s] Map: 94%|ββββββββββ| 46612/49759 [00:31<00:02, 1499.66 examples/s] Map: 94%|ββββββββββ| 46771/49759 [00:31<00:01, 1521.26 examples/s] Map: 94%|ββββββββββ| 46991/49759 [00:31<00:01, 1490.61 examples/s] Map: 95%|ββββββββββ| 47183/49759 [00:31<00:01, 1413.87 examples/s] Map: 95%|ββββββββββ| 47340/49759 [00:31<00:01, 1449.01 examples/s] Map: 95%|ββββββββββ| 47500/49759 [00:31<00:01, 1485.46 examples/s] Map: 96%|ββββββββββ| 47653/49759 [00:32<00:01, 1493.31 examples/s] Map: 96%|ββββββββββ| 47804/49759 [00:32<00:01, 1496.03 examples/s] Map: 96%|ββββββββββ| 47958/49759 [00:32<00:01, 1506.93 examples/s] Map: 97%|ββββββββββ| 48163/49759 [00:32<00:01, 1424.73 examples/s] Map: 97%|ββββββββββ| 48318/49759 [00:32<00:00, 1453.87 examples/s] Map: 97%|ββββββββββ| 48469/49759 [00:32<00:00, 1467.67 examples/s] Map: 98%|ββββββββββ| 48691/49759 [00:32<00:00, 1465.38 examples/s] Map: 98%|ββββββββββ| 48906/49759 [00:32<00:00, 1452.18 examples/s] Map: 99%|ββββββββββ| 49091/49759 [00:33<00:00, 1374.51 examples/s] Map: 99%|ββββββββββ| 49247/49759 [00:33<00:00, 1417.33 examples/s] Map: 99%|ββββββββββ| 49416/49759 [00:33<00:00, 1481.57 examples/s] Map: 100%|ββββββββββ| 49577/49759 [00:33<00:00, 1512.78 examples/s] Map: 100%|ββββββββββ| 49737/49759 [00:33<00:00, 1536.10 examples/s] Map: 0%| | 0/2000 [00:00<?, ? examples/s] Map: 7%|β | 149/2000 [00:00<00:01, 1477.72 examples/s] Map: 15%|ββ | 300/2000 [00:00<00:01, 1492.96 examples/s] Map: 26%|βββ | 526/2000 [00:00<00:00, 1490.13 examples/s] Map: 34%|ββββ | 684/2000 [00:00<00:00, 1520.07 examples/s] Map: 42%|βββββ | 838/2000 [00:00<00:00, 1524.30 examples/s] Map: 51%|βββββ | 1018/2000 [00:00<00:00, 1382.74 examples/s] Map: 59%|ββββββ | 1175/2000 [00:00<00:00, 1430.32 examples/s] Map: 67%|βββββββ | 1334/2000 [00:00<00:00, 1471.20 examples/s] Map: 75%|ββββββββ | 1500/2000 [00:01<00:00, 1522.62 examples/s] Map: 86%|βββββββββ | 1730/2000 [00:01<00:00, 1522.18 examples/s] Map: 95%|ββββββββββ| 1893/2000 [00:01<00:00, 1548.07 examples/s] 0%| | 0/388 [00:00<?, ?it/s] 0%| | 1/388 [00:31<3:22:56, 31.46s/it] 1%| | 2/388 [01:02<3:21:57, 31.39s/it] 1%| | 3/388 [01:33<3:20:00, 31.17s/it] 1%| | 4/388 [02:06<3:23:25, 31.79s/it] 1%|β | 5/388 [02:38<3:23:30, 31.88s/it] 2%|β | 6/388 [03:11<3:25:17, 32.25s/it] 2%|β | 7/388 [03:44<3:26:23, 32.50s/it] 2%|β | 8/388 [04:17<3:25:54, 32.51s/it] 2%|β | 9/388 [04:50<3:26:30, 32.69s/it] 3%|β | 10/388 [05:22<3:24:49, 32.51s/it] 3%|β | 10/388 [05:22<3:24:49, 32.51s/it] 3%|β | 11/388 [05:54<3:23:33, 32.40s/it] 3%|β | 12/388 [06:27<3:24:20, 32.61s/it] 3%|β | 13/388 [07:00<3:24:49, 32.77s/it] 4%|β | 14/388 [07:33<3:24:52, 32.87s/it] 4%|β | 15/388 [08:06<3:24:19, 32.87s/it] 4%|β | 16/388 [08:39<3:24:16, 32.95s/it] 4%|β | 17/388 [09:11<3:21:45, 32.63s/it] 5%|β | 18/388 [09:42<3:18:50, 32.24s/it] 5%|β | 19/388 [10:16<3:20:07, 32.54s/it] 5%|β | 20/388 [10:49<3:20:13, 32.65s/it] 5%|β | 20/388 [10:49<3:20:13, 32.65s/it] 5%|β | 21/388 [11:22<3:20:34, 32.79s/it] 6%|β | 22/388 [11:55<3:20:38, 32.89s/it] 6%|β | 23/388 [12:28<3:20:31, 32.96s/it] 6%|β | 24/388 [12:58<3:15:24, 32.21s/it] 6%|β | 25/388 [13:31<3:15:35, 32.33s/it] 7%|β | 26/388 [14:04<3:16:32, 32.58s/it] 7%|β | 27/388 [14:35<3:13:42, 32.20s/it] 7%|β | 28/388 [15:09<3:14:53, 32.48s/it] 7%|β | 29/388 [15:41<3:14:32, 32.51s/it] 8%|β | 30/388 [16:14<3:15:08, 32.70s/it] 8%|β | 30/388 [16:14<3:15:08, 32.70s/it] 8%|β | 31/388 [16:47<3:14:48, 32.74s/it] 8%|β | 32/388 [17:20<3:14:00, 32.70s/it] 9%|β | 33/388 [17:53<3:14:11, 32.82s/it] 9%|β | 34/388 [18:25<3:12:25, 32.61s/it] 9%|β | 35/388 [18:58<3:12:49, 32.77s/it] 9%|β | 36/388 [19:30<3:10:11, 32.42s/it] 10%|β | 37/388 [20:02<3:09:30, 32.39s/it] 10%|β | 38/388 [20:35<3:10:16, 32.62s/it] 10%|β | 39/388 [21:08<3:09:13, 32.53s/it] 10%|β | 40/388 [21:40<3:08:02, 32.42s/it] 10%|β | 40/388 [21:40<3:08:02, 32.42s/it] 11%|β | 41/388 [22:09<3:01:35, 31.40s/it] 11%|β | 42/388 [22:42<3:04:02, 31.91s/it] 11%|β | 43/388 [23:15<3:05:36, 32.28s/it] 11%|ββ | 44/388 [23:48<3:06:31, 32.53s/it] 12%|ββ | 45/388 [24:20<3:04:50, 32.33s/it] 12%|ββ | 46/388 [24:53<3:05:50, 32.60s/it] 12%|ββ | 47/388 [25:26<3:05:44, 32.68s/it] 12%|ββ | 48/388 [25:59<3:05:53, 32.80s/it] 13%|ββ | 49/388 [26:31<3:03:43, 32.52s/it] 13%|ββ | 50/388 [27:03<3:02:29, 32.40s/it] 13%|ββ | 50/388 [27:03<3:02:29, 32.40s/it] 13%|ββ | 51/388 [27:36<3:03:07, 32.60s/it] 13%|ββ | 52/388 [28:09<3:02:09, 32.53s/it] 14%|ββ | 53/388 [28:42<3:02:35, 32.70s/it] 14%|ββ | 54/388 [29:12<2:57:59, 31.97s/it] 14%|ββ | 55/388 [29:44<2:58:10, 32.10s/it] 14%|ββ | 56/388 [30:18<2:59:30, 32.44s/it] 15%|ββ | 57/388 [30:51<3:00:06, 32.65s/it] 15%|ββ | 58/388 [31:24<3:00:21, 32.79s/it] 15%|ββ | 59/388 [31:57<3:00:22, 32.89s/it] 15%|ββ | 60/388 [32:30<3:00:16, 32.98s/it] 15%|ββ | 60/388 [32:30<3:00:16, 32.98s/it] 16%|ββ | 61/388 [33:02<2:58:41, 32.79s/it] 16%|ββ | 62/388 [33:36<2:58:44, 32.90s/it] 16%|ββ | 63/388 [34:09<2:58:26, 32.94s/it] 16%|ββ | 64/388 [34:42<2:58:31, 33.06s/it] 17%|ββ | 65/388 [35:15<2:58:12, 33.10s/it] 17%|ββ | 66/388 [35:48<2:57:36, 33.09s/it] 17%|ββ | 67/388 [36:22<2:57:16, 33.13s/it] 18%|ββ | 68/388 [36:55<2:56:50, 33.16s/it] 18%|ββ | 69/388 [37:26<2:54:00, 32.73s/it] 18%|ββ | 70/388 [37:58<2:51:12, 32.30s/it] 18%|ββ | 70/388 [37:58<2:51:12, 32.30s/it] 18%|ββ | 71/388 [38:31<2:51:36, 32.48s/it] 19%|ββ | 72/388 [39:03<2:50:23, 32.35s/it] 19%|ββ | 73/388 [39:32<2:45:43, 31.57s/it] 19%|ββ | 74/388 [40:05<2:46:14, 31.77s/it] 19%|ββ | 75/388 [40:37<2:46:02, 31.83s/it] 20%|ββ | 76/388 [41:09<2:46:45, 32.07s/it] 20%|ββ | 77/388 [41:42<2:47:09, 32.25s/it] 20%|ββ | 78/388 [42:15<2:47:12, 32.36s/it] 20%|ββ | 79/388 [42:46<2:45:09, 32.07s/it] 21%|ββ | 80/388 [43:19<2:45:44, 32.29s/it] 21%|ββ | 80/388 [43:19<2:45:44, 32.29s/it] 21%|ββ | 81/388 [43:50<2:43:26, 31.94s/it] 21%|ββ | 82/388 [44:20<2:40:48, 31.53s/it] 21%|βββ | 83/388 [44:53<2:42:10, 31.90s/it] 22%|βββ | 84/388 [45:26<2:42:48, 32.13s/it] 22%|βββ | 85/388 [45:58<2:41:46, 32.03s/it] 22%|βββ | 86/388 [46:29<2:40:20, 31.85s/it] 22%|βββ | 87/388 [47:02<2:40:59, 32.09s/it] 23%|βββ | 88/388 [47:33<2:39:45, 31.95s/it] 23%|βββ | 89/388 [48:06<2:40:36, 32.23s/it] 23%|βββ | 90/388 [48:37<2:38:01, 31.82s/it] 23%|βββ | 90/388 [48:37<2:38:01, 31.82s/it] 23%|βββ | 91/388 [49:09<2:37:37, 31.84s/it] 24%|βββ | 92/388 [49:42<2:38:31, 32.13s/it] 24%|βββ | 93/388 [50:15<2:39:06, 32.36s/it] 24%|βββ | 94/388 [50:47<2:39:06, 32.47s/it] 24%|βββ | 95/388 [51:20<2:37:56, 32.34s/it] 25%|βββ | 96/388 [51:52<2:37:53, 32.44s/it] 25%|βββ | 97/388 [52:23<2:35:15, 32.01s/it] 25%|βββ | 98/388 [52:56<2:35:33, 32.19s/it] 26%|βββ | 99/388 [53:27<2:34:01, 31.98s/it] 26%|βββ | 100/388 [53:58<2:32:06, 31.69s/it] 26%|βββ | 100/388 [53:58<2:32:06, 31.69s/it] 26%|βββ | 101/388 [54:31<2:33:09, 32.02s/it] 26%|βββ | 102/388 [55:04<2:33:37, 32.23s/it] 27%|βββ | 103/388 [55:37<2:33:46, 32.37s/it] 27%|βββ | 104/388 [56:09<2:32:50, 32.29s/it] 27%|βββ | 105/388 [56:41<2:32:48, 32.40s/it] 27%|βββ | 106/388 [57:14<2:32:53, 32.53s/it] 28%|βββ | 107/388 [57:46<2:31:36, 32.37s/it] 28%|βββ | 108/388 [58:19<2:31:26, 32.45s/it] 28%|βββ | 109/388 [58:50<2:28:38, 31.97s/it] 28%|βββ | 110/388 [59:22<2:29:14, 32.21s/it] 28%|βββ | 110/388 [59:22<2:29:14, 32.21s/it] 29%|βββ | 111/388 [59:53<2:26:47, 31.79s/it] 29%|βββ | 112/388 [1:00:26<2:27:33, 32.08s/it] 29%|βββ | 113/388 [1:00:59<2:28:07, 32.32s/it] 29%|βββ | 114/388 [1:01:31<2:27:11, 32.23s/it] 30%|βββ | 115/388 [1:02:04<2:27:30, 32.42s/it] 30%|βββ | 116/388 [1:02:36<2:27:25, 32.52s/it] 30%|βββ | 117/388 [1:03:09<2:27:13, 32.59s/it] 30%|βββ | 118/388 [1:03:42<2:26:40, 32.60s/it] 31%|βββ | 119/388 [1:04:15<2:26:27, 32.67s/it] 31%|βββ | 120/388 [1:04:48<2:26:12, 32.73s/it] 31%|βββ | 120/388 [1:04:48<2:26:12, 32.73s/it] 31%|βββ | 121/388 [1:05:20<2:24:59, 32.58s/it] 31%|ββββ | 122/388 [1:05:52<2:24:16, 32.54s/it] 32%|ββββ | 123/388 [1:06:25<2:23:59, 32.60s/it] 32%|ββββ | 124/388 [1:06:58<2:23:38, 32.64s/it] 32%|ββββ | 125/388 [1:07:29<2:20:56, 32.15s/it] 32%|ββββ | 126/388 [1:08:01<2:20:44, 32.23s/it] 33%|ββββ | 127/388 [1:08:32<2:19:03, 31.97s/it] 33%|ββββ | 128/388 [1:09:05<2:19:29, 32.19s/it] 33%|ββββ | 129/388 [1:09:36<2:17:29, 31.85s/it] 34%|ββββ | 130/388 [1:10:07<2:15:50, 31.59s/it] 34%|ββββ | 130/388 [1:10:07<2:15:50, 31.59s/it] 34%|ββββ | 131/388 [1:10:40<2:16:54, 31.96s/it] 34%|ββββ | 132/388 [1:11:13<2:17:12, 32.16s/it] 34%|ββββ | 133/388 [1:11:44<2:15:02, 31.78s/it] 35%|ββββ | 134/388 [1:12:15<2:14:18, 31.73s/it] 35%|ββββ | 135/388 [1:12:45<2:10:52, 31.04s/it] 35%|ββββ | 136/388 [1:13:14<2:07:57, 30.47s/it] 35%|ββββ | 137/388 [1:13:46<2:10:21, 31.16s/it] 36%|ββββ | 138/388 [1:14:18<2:10:51, 31.41s/it] 36%|ββββ | 139/388 [1:14:51<2:11:40, 31.73s/it] 36%|ββββ | 140/388 [1:15:22<2:10:19, 31.53s/it] 36%|ββββ | 140/388 [1:15:22<2:10:19, 31.53s/it] 36%|ββββ | 141/388 [1:15:55<2:11:09, 31.86s/it] 37%|ββββ | 142/388 [1:16:25<2:08:28, 31.34s/it] 37%|ββββ | 143/388 [1:16:57<2:09:39, 31.75s/it] 37%|ββββ | 144/388 [1:17:29<2:08:21, 31.57s/it] 37%|ββββ | 145/388 [1:18:00<2:07:00, 31.36s/it] 38%|ββββ | 146/388 [1:18:30<2:05:32, 31.13s/it] 38%|ββββ | 147/388 [1:19:03<2:07:01, 31.62s/it] 38%|ββββ | 148/388 [1:19:34<2:06:26, 31.61s/it] 38%|ββββ | 149/388 [1:20:07<2:07:00, 31.89s/it] 39%|ββββ | 150/388 [1:20:38<2:05:32, 31.65s/it] 39%|ββββ | 150/388 [1:20:38<2:05:32, 31.65s/it] 39%|ββββ | 151/388 [1:21:11<2:06:15, 31.96s/it] 39%|ββββ | 152/388 [1:21:43<2:05:32, 31.92s/it] 39%|ββββ | 153/388 [1:22:15<2:06:01, 32.18s/it] 40%|ββββ | 154/388 [1:22:47<2:04:21, 31.89s/it] 40%|ββββ | 155/388 [1:23:19<2:04:55, 32.17s/it] 40%|ββββ | 156/388 [1:23:51<2:03:30, 31.94s/it] 40%|ββββ | 157/388 [1:24:23<2:02:50, 31.91s/it] 41%|ββββ | 158/388 [1:24:55<2:03:01, 32.10s/it] 41%|ββββ | 159/388 [1:25:27<2:02:40, 32.14s/it] 41%|ββββ | 160/388 [1:26:00<2:02:54, 32.35s/it] 41%|ββββ | 160/388 [1:26:00<2:02:54, 32.35s/it] 41%|βββββ | 161/388 [1:26:33<2:02:50, 32.47s/it] 42%|βββββ | 162/388 [1:27:04<2:01:06, 32.15s/it] 42%|βββββ | 163/388 [1:27:37<2:01:15, 32.34s/it] 42%|βββββ | 164/388 [1:28:10<2:01:08, 32.45s/it] 43%|βββββ | 165/388 [1:28:42<1:59:48, 32.24s/it] 43%|βββββ | 166/388 [1:29:13<1:58:34, 32.05s/it] 43%|βββββ | 167/388 [1:29:43<1:55:52, 31.46s/it] 43%|βββββ | 168/388 [1:30:16<1:56:52, 31.88s/it] 44%|βββββ | 169/388 [1:30:49<1:57:17, 32.14s/it] 44%|βββββ | 170/388 [1:31:19<1:54:41, 31.57s/it] 44%|βββββ | 170/388 [1:31:19<1:54:41, 31.57s/it] 44%|βββββ | 171/388 [1:31:52<1:55:18, 31.88s/it] 44%|βββββ | 172/388 [1:32:23<1:54:14, 31.74s/it] 45%|βββββ | 173/388 [1:32:55<1:53:53, 31.78s/it] 45%|βββββ | 174/388 [1:33:26<1:52:47, 31.62s/it] 45%|βββββ | 175/388 [1:33:59<1:53:19, 31.92s/it] 45%|βββββ | 176/388 [1:34:30<1:51:36, 31.59s/it] 46%|βββββ | 177/388 [1:35:01<1:50:23, 31.39s/it] 46%|βββββ | 178/388 [1:35:33<1:51:19, 31.81s/it] 46%|βββββ | 179/388 [1:36:05<1:50:04, 31.60s/it] 46%|βββββ | 180/388 [1:36:37<1:50:24, 31.85s/it] 46%|βββββ | 180/388 [1:36:37<1:50:24, 31.85s/it] 47%|βββββ | 181/388 [1:37:10<1:50:46, 32.11s/it] 47%|βββββ | 182/388 [1:37:43<1:51:00, 32.33s/it] 47%|βββββ | 183/388 [1:38:15<1:50:53, 32.46s/it] 47%|βββββ | 184/388 [1:38:47<1:49:24, 32.18s/it] 48%|βββββ | 185/388 [1:39:20<1:49:25, 32.34s/it] 48%|βββββ | 186/388 [1:39:52<1:49:18, 32.47s/it] 48%|βββββ | 187/388 [1:40:23<1:47:24, 32.06s/it] 48%|βββββ | 188/388 [1:40:56<1:47:24, 32.22s/it] 49%|βββββ | 189/388 [1:41:27<1:45:51, 31.91s/it] 49%|βββββ | 190/388 [1:42:00<1:46:11, 32.18s/it] 49%|βββββ | 190/388 [1:42:00<1:46:11, 32.18s/it] 49%|βββββ | 191/388 [1:42:31<1:44:45, 31.90s/it] 49%|βββββ | 192/388 [1:43:04<1:44:30, 31.99s/it] 50%|βββββ | 193/388 [1:43:35<1:43:26, 31.83s/it] 50%|βββββ | 194/388 [1:44:08<1:43:42, 32.07s/it] 50%|βββββ | 195/388 [1:44:39<1:42:37, 31.90s/it] 51%|βββββ | 196/388 [1:45:12<1:42:49, 32.13s/it] 51%|βββββ | 197/388 [1:45:43<1:41:40, 31.94s/it] 51%|βββββ | 198/388 [1:46:15<1:40:42, 31.80s/it] 51%|ββββββ | 199/388 [1:46:47<1:41:02, 32.07s/it] 52%|ββββββ | 200/388 [1:47:19<1:40:02, 31.93s/it] 52%|ββββββ | 200/388 [1:47:19<1:40:02, 31.93s/it]{'loss': 2.2847, 'learning_rate': 2.9999999999999997e-05, 'epoch': 0.03} | |
{'loss': 2.2109, 'learning_rate': 5.9999999999999995e-05, 'epoch': 0.05} | |
{'loss': 1.9736, 'learning_rate': 8.699999999999999e-05, 'epoch': 0.08} | |
{'loss': 1.5818, 'learning_rate': 0.000117, 'epoch': 0.1} | |
{'loss': 1.2586, 'learning_rate': 0.000147, 'epoch': 0.13} | |
{'loss': 1.1373, 'learning_rate': 0.00017699999999999997, 'epoch': 0.15} | |
{'loss': 0.9888, 'learning_rate': 0.00020699999999999996, 'epoch': 0.18} | |
{'loss': 0.8617, 'learning_rate': 0.000237, 'epoch': 0.21} | |
{'loss': 0.8454, 'learning_rate': 0.000267, 'epoch': 0.23} | |
{'loss': 0.8458, 'learning_rate': 0.00029699999999999996, 'epoch': 0.26} | |
{'loss': 0.8341, 'learning_rate': 0.000290625, 'epoch': 0.28} | |
{'loss': 0.8218, 'learning_rate': 0.0002802083333333333, 'epoch': 0.31} | |
{'loss': 0.8304, 'learning_rate': 0.00026979166666666666, 'epoch': 0.33} | |
{'loss': 0.8161, 'learning_rate': 0.000259375, 'epoch': 0.36} | |
{'loss': 0.8097, 'learning_rate': 0.00024895833333333334, 'epoch': 0.39} | |
{'loss': 0.8134, 'learning_rate': 0.00023854166666666663, 'epoch': 0.41} | |
{'loss': 0.8214, 'learning_rate': 0.00022812499999999997, 'epoch': 0.44} | |
{'loss': 0.8095, 'learning_rate': 0.00021770833333333332, 'epoch': 0.46} | |
{'loss': 0.8028, 'learning_rate': 0.00020729166666666663, 'epoch': 0.49} | |
{'loss': 0.8014, 'learning_rate': 0.00019687499999999997, 'epoch': 0.51} | |
0%| | 0/250 [00:00<?, ?it/s][A | |
1%| | 2/250 [00:00<01:14, 3.31it/s][A | |
1%| | 3/250 [00:01<01:46, 2.31it/s][A | |
2%|β | 4/250 [00:01<01:58, 2.07it/s][A | |
2%|β | 5/250 [00:02<01:58, 2.06it/s][A | |
2%|β | 6/250 [00:02<02:04, 1.96it/s][A | |
3%|β | 7/250 [00:03<02:08, 1.90it/s][A | |
3%|β | 8/250 [00:03<02:11, 1.84it/s][A | |
4%|β | 9/250 [00:04<02:16, 1.77it/s][A | |
4%|β | 10/250 [00:05<02:05, 1.91it/s][A | |
4%|β | 11/250 [00:05<01:58, 2.01it/s][A | |
5%|β | 12/250 [00:06<02:06, 1.88it/s][A | |
5%|β | 13/250 [00:06<02:12, 1.79it/s][A | |
6%|β | 14/250 [00:07<02:00, 1.96it/s][A | |
6%|β | 15/250 [00:07<02:07, 1.85it/s][A | |
6%|β | 16/250 [00:08<02:11, 1.78it/s][A | |
7%|β | 17/250 [00:08<02:07, 1.83it/s][A | |
7%|β | 18/250 [00:09<02:05, 1.85it/s][A | |
8%|β | 19/250 [00:09<02:10, 1.77it/s][A | |
8%|β | 20/250 [00:10<02:05, 1.84it/s][A | |
8%|β | 21/250 [00:10<01:58, 1.93it/s][A | |
9%|β | 22/250 [00:11<01:52, 2.02it/s][A | |
9%|β | 23/250 [00:11<01:51, 2.03it/s][A | |
10%|β | 24/250 [00:12<01:59, 1.89it/s][A | |
10%|β | 25/250 [00:12<01:51, 2.02it/s][A | |
10%|β | 26/250 [00:13<01:51, 2.01it/s][A | |
11%|β | 27/250 [00:13<01:55, 1.92it/s][A | |
11%|β | 28/250 [00:14<01:50, 2.01it/s][A | |
12%|ββ | 29/250 [00:15<01:57, 1.88it/s][A | |
12%|ββ | 30/250 [00:15<02:02, 1.79it/s][A | |
12%|ββ | 31/250 [00:16<01:56, 1.88it/s][A | |
13%|ββ | 32/250 [00:16<02:01, 1.79it/s][A | |
13%|ββ | 33/250 [00:17<02:04, 1.74it/s][A | |
14%|ββ | 34/250 [00:17<01:57, 1.84it/s][A | |
14%|ββ | 35/250 [00:18<02:01, 1.77it/s][A | |
14%|ββ | 36/250 [00:18<01:57, 1.82it/s][A | |
15%|ββ | 37/250 [00:19<02:01, 1.76it/s][A | |
15%|ββ | 38/250 [00:20<02:03, 1.71it/s][A | |
16%|ββ | 39/250 [00:20<01:52, 1.88it/s][A | |
16%|ββ | 40/250 [00:21<01:56, 1.80it/s][A | |
16%|ββ | 41/250 [00:21<01:56, 1.79it/s][A | |
17%|ββ | 42/250 [00:22<01:59, 1.74it/s][A | |
17%|ββ | 43/250 [00:22<01:52, 1.83it/s][A | |
18%|ββ | 44/250 [00:23<01:56, 1.77it/s][A | |
18%|ββ | 45/250 [00:24<01:59, 1.72it/s][A | |
18%|ββ | 46/250 [00:24<01:56, 1.75it/s][A | |
19%|ββ | 47/250 [00:25<01:55, 1.76it/s][A | |
19%|ββ | 48/250 [00:25<01:51, 1.82it/s][A | |
20%|ββ | 49/250 [00:26<01:54, 1.76it/s][A | |
20%|ββ | 50/250 [00:26<01:47, 1.86it/s][A | |
20%|ββ | 51/250 [00:27<01:39, 2.01it/s][A | |
21%|ββ | 52/250 [00:27<01:40, 1.97it/s][A | |
21%|ββ | 53/250 [00:28<01:46, 1.85it/s][A | |
22%|βββ | 54/250 [00:28<01:46, 1.85it/s][A | |
22%|βββ | 55/250 [00:29<01:49, 1.77it/s][A | |
22%|βββ | 56/250 [00:30<01:52, 1.73it/s][A | |
23%|βββ | 57/250 [00:30<01:50, 1.75it/s][A | |
23%|βββ | 58/250 [00:31<01:45, 1.81it/s][A | |
24%|βββ | 59/250 [00:31<01:40, 1.91it/s][A | |
24%|βββ | 60/250 [00:32<01:32, 2.04it/s][A | |
24%|βββ | 61/250 [00:32<01:32, 2.04it/s][A | |
25%|βββ | 62/250 [00:33<01:36, 1.96it/s][A | |
25%|βββ | 63/250 [00:33<01:38, 1.90it/s][A | |
26%|βββ | 64/250 [00:34<01:36, 1.93it/s][A | |
26%|βββ | 65/250 [00:34<01:41, 1.83it/s][A | |
26%|βββ | 66/250 [00:35<01:33, 1.96it/s][A | |
27%|βββ | 67/250 [00:35<01:32, 1.98it/s][A | |
27%|βββ | 68/250 [00:36<01:24, 2.15it/s][A | |
28%|βββ | 69/250 [00:36<01:25, 2.12it/s][A | |
28%|βββ | 70/250 [00:36<01:23, 2.17it/s][A | |
28%|βββ | 71/250 [00:37<01:24, 2.12it/s][A | |
29%|βββ | 72/250 [00:38<01:31, 1.95it/s][A | |
29%|βββ | 73/250 [00:38<01:28, 2.00it/s][A | |
30%|βββ | 74/250 [00:39<01:33, 1.88it/s][A | |
30%|βββ | 75/250 [00:39<01:37, 1.79it/s][A | |
30%|βββ | 76/250 [00:40<01:40, 1.74it/s][A | |
31%|βββ | 77/250 [00:40<01:38, 1.76it/s][A | |
31%|βββ | 78/250 [00:41<01:39, 1.72it/s][A | |
32%|ββββ | 79/250 [00:42<01:39, 1.73it/s][A | |
32%|ββββ | 80/250 [00:42<01:40, 1.69it/s][A | |
32%|ββββ | 81/250 [00:43<01:41, 1.67it/s][A | |
33%|ββββ | 82/250 [00:43<01:34, 1.78it/s][A | |
33%|ββββ | 83/250 [00:44<01:36, 1.73it/s][A | |
34%|ββββ | 84/250 [00:44<01:30, 1.83it/s][A | |
34%|ββββ | 85/250 [00:45<01:28, 1.87it/s][A | |
34%|ββββ | 86/250 [00:45<01:28, 1.86it/s][A | |
35%|ββββ | 87/250 [00:46<01:21, 2.00it/s][A | |
35%|ββββ | 88/250 [00:46<01:22, 1.96it/s][A | |
36%|ββββ | 89/250 [00:47<01:25, 1.88it/s][A | |
36%|ββββ | 90/250 [00:48<01:29, 1.79it/s][A | |
36%|ββββ | 91/250 [00:48<01:31, 1.74it/s][A | |
37%|ββββ | 92/250 [00:49<01:30, 1.74it/s][A | |
37%|ββββ | 93/250 [00:49<01:22, 1.90it/s][A | |
38%|ββββ | 94/250 [00:50<01:26, 1.81it/s][A | |
38%|ββββ | 95/250 [00:50<01:28, 1.75it/s][A | |
38%|ββββ | 96/250 [00:51<01:29, 1.71it/s][A | |
39%|ββββ | 97/250 [00:52<01:30, 1.68it/s][A | |
39%|ββββ | 98/250 [00:52<01:30, 1.67it/s][A | |
40%|ββββ | 99/250 [00:53<01:19, 1.89it/s][A | |
40%|ββββ | 100/250 [00:53<01:15, 2.00it/s][A | |
40%|ββββ | 101/250 [00:54<01:17, 1.92it/s][A | |
41%|ββββ | 102/250 [00:54<01:18, 1.88it/s][A | |
41%|ββββ | 103/250 [00:55<01:15, 1.94it/s][A | |
42%|βββββ | 104/250 [00:55<01:19, 1.84it/s][A | |
42%|βββββ | 105/250 [00:56<01:13, 1.96it/s][A | |
42%|βββββ | 106/250 [00:56<01:05, 2.20it/s][A | |
43%|βββββ | 107/250 [00:57<01:08, 2.08it/s][A | |
43%|βββββ | 108/250 [00:57<01:11, 1.98it/s][A | |
44%|βββββ | 109/250 [00:58<01:10, 2.01it/s][A | |
44%|βββββ | 110/250 [00:58<01:09, 2.02it/s][A | |
44%|βββββ | 111/250 [00:59<01:13, 1.88it/s][A | |
45%|βββββ | 112/250 [00:59<01:11, 1.94it/s][A | |
45%|βββββ | 113/250 [01:00<01:14, 1.84it/s][A | |
46%|βββββ | 114/250 [01:00<01:12, 1.89it/s][A | |
46%|βββββ | 115/250 [01:01<01:09, 1.93it/s][A | |
46%|βββββ | 116/250 [01:01<01:05, 2.04it/s][A | |
47%|βββββ | 117/250 [01:02<01:10, 1.90it/s][A | |
47%|βββββ | 118/250 [01:02<01:07, 1.97it/s][A | |
48%|βββββ | 119/250 [01:03<01:10, 1.85it/s][A | |
48%|βββββ | 120/250 [01:04<01:13, 1.78it/s][A | |
48%|βββββ | 121/250 [01:04<01:12, 1.77it/s][A | |
49%|βββββ | 122/250 [01:05<01:08, 1.86it/s][A | |
49%|βββββ | 123/250 [01:05<01:11, 1.79it/s][A | |
50%|βββββ | 124/250 [01:06<01:08, 1.84it/s][A | |
50%|βββββ | 125/250 [01:06<01:09, 1.81it/s][A | |
50%|βββββ | 126/250 [01:07<01:11, 1.75it/s][A | |
51%|βββββ | 127/250 [01:08<01:12, 1.70it/s][A | |
51%|βββββ | 128/250 [01:08<01:12, 1.68it/s][A | |
52%|ββββββ | 129/250 [01:09<01:12, 1.66it/s][A | |
52%|ββββββ | 130/250 [01:09<01:08, 1.74it/s][A | |
52%|ββββββ | 131/250 [01:10<01:07, 1.75it/s][A | |
53%|ββββββ | 132/250 [01:10<01:09, 1.71it/s][A | |
53%|ββββββ | 133/250 [01:11<01:05, 1.79it/s][A | |
54%|ββββββ | 134/250 [01:11<01:00, 1.91it/s][A | |
54%|ββββββ | 135/250 [01:12<01:03, 1.82it/s][A | |
54%|ββββββ | 136/250 [01:13<01:04, 1.75it/s][A | |
55%|ββββββ | 137/250 [01:13<01:01, 1.83it/s][A | |
55%|ββββββ | 138/250 [01:14<01:01, 1.81it/s][A | |
56%|ββββββ | 139/250 [01:14<01:02, 1.76it/s][A | |
56%|ββββββ | 140/250 [01:15<00:58, 1.87it/s][A | |
56%|ββββββ | 141/250 [01:15<00:56, 1.93it/s][A | |
57%|ββββββ | 142/250 [01:16<00:59, 1.83it/s][A | |
57%|ββββββ | 143/250 [01:16<00:52, 2.02it/s][A | |
58%|ββββββ | 144/250 [01:17<00:56, 1.89it/s][A | |
58%|ββββββ | 145/250 [01:17<00:57, 1.84it/s][A | |
58%|ββββββ | 146/250 [01:18<00:52, 1.99it/s][A | |
59%|ββββββ | 147/250 [01:18<00:52, 1.96it/s][A | |
59%|ββββββ | 148/250 [01:19<00:51, 1.97it/s][A | |
60%|ββββββ | 149/250 [01:19<00:54, 1.85it/s][A | |
60%|ββββββ | 150/250 [01:20<00:54, 1.85it/s][A | |
60%|ββββββ | 151/250 [01:21<00:52, 1.88it/s][A | |
61%|ββββββ | 152/250 [01:21<00:52, 1.85it/s][A | |
61%|ββββββ | 153/250 [01:22<00:52, 1.85it/s][A | |
62%|βββββββ | 154/250 [01:22<00:51, 1.86it/s][A | |
62%|βββββββ | 155/250 [01:23<00:53, 1.78it/s][A | |
62%|βββββββ | 156/250 [01:23<00:54, 1.73it/s][A | |
63%|βββββββ | 157/250 [01:24<00:49, 1.90it/s][A | |
63%|βββββββ | 158/250 [01:24<00:49, 1.85it/s][A | |
64%|βββββββ | 159/250 [01:25<00:51, 1.78it/s][A | |
64%|βββββββ | 160/250 [01:26<00:51, 1.74it/s][A | |
64%|βββββββ | 161/250 [01:26<00:48, 1.83it/s][A | |
65%|βββββββ | 162/250 [01:27<00:49, 1.77it/s][A | |
65%|βββββββ | 163/250 [01:27<00:50, 1.72it/s][A | |
66%|βββββββ | 164/250 [01:28<00:48, 1.79it/s][A | |
66%|βββββββ | 165/250 [01:28<00:43, 1.96it/s][A | |
66%|βββββββ | 166/250 [01:29<00:39, 2.14it/s][A | |
67%|βββββββ | 167/250 [01:29<00:42, 1.96it/s][A | |
67%|βββββββ | 168/250 [01:30<00:44, 1.85it/s][A | |
68%|βββββββ | 169/250 [01:30<00:42, 1.91it/s][A | |
68%|βββββββ | 170/250 [01:31<00:43, 1.82it/s][A | |
68%|βββββββ | 171/250 [01:31<00:40, 1.95it/s][A | |
69%|βββββββ | 172/250 [01:32<00:39, 1.98it/s][A | |
69%|βββββββ | 173/250 [01:32<00:39, 1.93it/s][A | |
70%|βββββββ | 174/250 [01:33<00:40, 1.87it/s][A | |
70%|βββββββ | 175/250 [01:33<00:38, 1.93it/s][A | |
70%|βββββββ | 176/250 [01:34<00:37, 1.98it/s][A | |
71%|βββββββ | 177/250 [01:34<00:34, 2.10it/s][A | |
71%|βββββββ | 178/250 [01:35<00:34, 2.09it/s][A | |
72%|ββββββββ | 179/250 [01:35<00:36, 1.92it/s][A | |
72%|ββββββββ | 180/250 [01:36<00:38, 1.82it/s][A | |
72%|ββββββββ | 181/250 [01:37<00:37, 1.82it/s][A | |
73%|ββββββββ | 182/250 [01:37<00:35, 1.91it/s][A | |
73%|ββββββββ | 183/250 [01:38<00:35, 1.87it/s][A | |
74%|ββββββββ | 184/250 [01:38<00:36, 1.83it/s][A | |
74%|ββββββββ | 185/250 [01:39<00:32, 2.02it/s][A | |
74%|ββββββββ | 186/250 [01:39<00:33, 1.89it/s][A | |
75%|ββββββββ | 187/250 [01:40<00:31, 1.99it/s][A | |
75%|ββββββββ | 188/250 [01:40<00:29, 2.11it/s][A | |
76%|ββββββββ | 189/250 [01:41<00:31, 1.95it/s][A | |
76%|ββββββββ | 190/250 [01:41<00:32, 1.83it/s][A | |
76%|ββββββββ | 191/250 [01:42<00:33, 1.76it/s][A | |
77%|ββββββββ | 192/250 [01:42<00:33, 1.72it/s][A | |
77%|ββββββββ | 193/250 [01:43<00:33, 1.72it/s][A | |
78%|ββββββββ | 194/250 [01:44<00:33, 1.69it/s][A | |
78%|ββββββββ | 195/250 [01:44<00:29, 1.84it/s][A | |
78%|ββββββββ | 196/250 [01:45<00:28, 1.88it/s][A | |
79%|ββββββββ | 197/250 [01:45<00:27, 1.95it/s][A | |
79%|ββββββββ | 198/250 [01:46<00:27, 1.91it/s][A | |
80%|ββββββββ | 199/250 [01:46<00:25, 1.96it/s][A | |
80%|ββββββββ | 200/250 [01:47<00:26, 1.92it/s][A | |
80%|ββββββββ | 201/250 [01:47<00:25, 1.91it/s][A | |
81%|ββββββββ | 202/250 [01:48<00:24, 1.93it/s][A | |
81%|ββββββββ | 203/250 [01:48<00:23, 2.02it/s][A | |
82%|βββββββββ | 204/250 [01:49<00:24, 1.90it/s][A | |
82%|βββββββββ | 205/250 [01:49<00:24, 1.86it/s][A | |
82%|βββββββββ | 206/250 [01:50<00:23, 1.90it/s][A | |
83%|βββββββββ | 207/250 [01:50<00:21, 1.95it/s][A | |
83%|βββββββββ | 208/250 [01:51<00:20, 2.08it/s][A | |
84%|βββββββββ | 209/250 [01:51<00:21, 1.93it/s][A | |
84%|βββββββββ | 210/250 [01:52<00:21, 1.82it/s][A | |
84%|βββββββββ | 211/250 [01:52<00:22, 1.76it/s][A | |
85%|βββββββββ | 212/250 [01:53<00:21, 1.78it/s][A | |
85%|βββββββββ | 213/250 [01:53<00:19, 1.88it/s][A | |
86%|βββββββββ | 214/250 [01:54<00:19, 1.81it/s][A | |
86%|βββββββββ | 215/250 [01:55<00:20, 1.75it/s][A | |
86%|βββββββββ | 216/250 [01:55<00:19, 1.76it/s][A | |
87%|βββββββββ | 217/250 [01:56<00:17, 1.89it/s][A | |
87%|βββββββββ | 218/250 [01:56<00:15, 2.05it/s][A | |
88%|βββββββββ | 219/250 [01:57<00:14, 2.12it/s][A | |
88%|βββββββββ | 220/250 [01:57<00:13, 2.17it/s][A | |
88%|βββββββββ | 221/250 [01:57<00:13, 2.08it/s][A | |
89%|βββββββββ | 222/250 [01:58<00:12, 2.20it/s][A | |
89%|βββββββββ | 223/250 [01:58<00:12, 2.13it/s][A | |
90%|βββββββββ | 224/250 [01:59<00:13, 1.95it/s][A | |
90%|βββββββββ | 225/250 [01:59<00:12, 2.01it/s][A | |
90%|βββββββββ | 226/250 [02:00<00:12, 1.96it/s][A | |
91%|βββββββββ | 227/250 [02:01<00:11, 1.92it/s][A | |
91%|βββββββββ | 228/250 [02:01<00:10, 2.02it/s][A | |
92%|ββββββββββ| 229/250 [02:02<00:11, 1.88it/s][A | |
92%|ββββββββββ| 230/250 [02:02<00:11, 1.80it/s][A | |
92%|ββββββββββ| 231/250 [02:03<00:10, 1.88it/s][A | |
93%|ββββββββββ| 232/250 [02:03<00:09, 1.85it/s][A | |
93%|ββββββββββ| 233/250 [02:04<00:09, 1.78it/s][A | |
94%|ββββββββββ| 234/250 [02:04<00:08, 1.78it/s][A | |
94%|ββββββββββ| 235/250 [02:05<00:08, 1.78it/s][A | |
94%|ββββββββββ| 236/250 [02:06<00:08, 1.75it/s][A | |
95%|ββββββββββ| 237/250 [02:06<00:07, 1.74it/s][A | |
95%|ββββββββββ| 238/250 [02:07<00:07, 1.71it/s][A | |
96%|ββββββββββ| 239/250 [02:07<00:06, 1.68it/s][A | |
96%|ββββββββββ| 240/250 [02:08<00:06, 1.66it/s][A | |
96%|ββββββββββ| 241/250 [02:08<00:05, 1.77it/s][A | |
97%|ββββββββββ| 242/250 [02:09<00:04, 1.73it/s][A | |
97%|ββββββββββ| 243/250 [02:10<00:03, 1.77it/s][A | |
98%|ββββββββββ| 244/250 [02:10<00:03, 1.72it/s][A | |
98%|ββββββββββ| 245/250 [02:11<00:02, 1.69it/s][A | |
98%|ββββββββββ| 246/250 [02:11<00:02, 1.72it/s][A | |
99%|ββββββββββ| 247/250 [02:12<00:01, 1.96it/s][A | |
99%|ββββββββββ| 248/250 [02:12<00:01, 1.85it/s][A | |
100%|ββββββββββ| 249/250 [02:13<00:00, 1.79it/s][A | |
100%|ββββββββββ| 250/250 [02:13<00:00, 1.82it/s][A | |
[A 52%|ββββββ | 200/388 [1:49:34<1:40:02, 31.93s/it] | |
100%|ββββββββββ| 250/250 [02:14<00:00, 1.82it/s][A | |
[A 52%|ββββββ | 201/388 [1:50:06<3:46:06, 72.55s/it] 52%|ββββββ | 202/388 [1:50:39<3:07:53, 60.61s/it] 52%|ββββββ | 203/388 [1:51:11<2:40:15, 51.97s/it] 53%|ββββββ | 204/388 [1:51:43<2:20:54, 45.95s/it] 53%|ββββββ | 205/388 [1:52:16<2:08:05, 42.00s/it] 53%|ββββββ | 206/388 [1:52:48<1:58:28, 39.06s/it] 53%|ββββββ | 207/388 [1:53:19<1:50:41, 36.69s/it] 54%|ββββββ | 208/388 [1:53:48<1:43:30, 34.50s/it] 54%|ββββββ | 209/388 [1:54:19<1:39:29, 33.35s/it] 54%|ββββββ | 210/388 [1:54:52<1:38:19, 33.14s/it] 54%|ββββββ | 210/388 [1:54:52<1:38:19, 33.14s/it] 54%|ββββββ | 211/388 [1:55:24<1:37:25, 33.02s/it] 55%|ββββββ | 212/388 [1:55:57<1:36:39, 32.95s/it] 55%|ββββββ | 213/388 [1:56:30<1:35:57, 32.90s/it] 55%|ββββββ | 214/388 [1:57:03<1:35:16, 32.85s/it] 55%|ββββββ | 215/388 [1:57:35<1:34:35, 32.80s/it] 56%|ββββββ | 216/388 [1:58:07<1:32:48, 32.37s/it] 56%|ββββββ | 217/388 [1:58:39<1:32:02, 32.29s/it] 56%|ββββββ | 218/388 [1:59:12<1:31:49, 32.41s/it] 56%|ββββββ | 219/388 [1:59:43<1:30:42, 32.20s/it] 57%|ββββββ | 220/388 [2:00:16<1:30:37, 32.37s/it] 57%|ββββββ | 220/388 [2:00:16<1:30:37, 32.37s/it] 57%|ββββββ | 221/388 [2:00:49<1:30:27, 32.50s/it] 57%|ββββββ | 222/388 [2:01:21<1:29:42, 32.42s/it] 57%|ββββββ | 223/388 [2:01:53<1:29:07, 32.41s/it] 58%|ββββββ | 224/388 [2:02:26<1:28:57, 32.55s/it] 58%|ββββββ | 225/388 [2:02:58<1:27:58, 32.38s/it] 58%|ββββββ | 226/388 [2:03:31<1:27:41, 32.48s/it] 59%|ββββββ | 227/388 [2:04:04<1:27:16, 32.52s/it] 59%|ββββββ | 228/388 [2:04:35<1:25:45, 32.16s/it] 59%|ββββββ | 229/388 [2:05:07<1:24:47, 32.00s/it] 59%|ββββββ | 230/388 [2:05:38<1:23:46, 31.81s/it] 59%|ββββββ | 230/388 [2:05:38<1:23:46, 31.81s/it] 60%|ββββββ | 231/388 [2:06:10<1:23:31, 31.92s/it] 60%|ββββββ | 232/388 [2:06:43<1:23:24, 32.08s/it] 60%|ββββββ | 233/388 [2:07:15<1:22:46, 32.04s/it] 60%|ββββββ | 234/388 [2:07:47<1:22:19, 32.08s/it] 61%|ββββββ | 235/388 [2:08:19<1:21:53, 32.12s/it] 61%|ββββββ | 236/388 [2:08:51<1:21:16, 32.08s/it] 61%|ββββββ | 237/388 [2:09:24<1:21:13, 32.27s/it] 61%|βββββββ | 238/388 [2:09:56<1:20:59, 32.40s/it] 62%|βββββββ | 239/388 [2:10:29<1:20:33, 32.44s/it] 62%|βββββββ | 240/388 [2:11:01<1:20:03, 32.45s/it] 62%|βββββββ | 240/388 [2:11:01<1:20:03, 32.45s/it] 62%|βββββββ | 241/388 [2:11:33<1:19:08, 32.30s/it] 62%|βββββββ | 242/388 [2:12:06<1:18:41, 32.34s/it] 63%|βββββββ | 243/388 [2:12:38<1:18:15, 32.38s/it] 63%|βββββββ | 244/388 [2:13:11<1:18:02, 32.51s/it] 63%|βββββββ | 245/388 [2:13:44<1:17:34, 32.55s/it] 63%|βββββββ | 246/388 [2:14:16<1:17:06, 32.58s/it] 64%|βββββββ | 247/388 [2:14:48<1:16:05, 32.38s/it] 64%|βββββββ | 248/388 [2:15:21<1:15:48, 32.49s/it] 64%|βββββββ | 249/388 [2:15:53<1:14:49, 32.30s/it] 64%|βββββββ | 250/388 [2:16:25<1:14:04, 32.21s/it] 64%|βββββββ | 250/388 [2:16:25<1:14:04, 32.21s/it] 65%|βββββββ | 251/388 [2:16:57<1:13:39, 32.26s/it] 65%|βββββββ | 252/388 [2:17:30<1:13:27, 32.41s/it] 65%|βββββββ | 253/388 [2:18:02<1:12:34, 32.26s/it] 65%|βββββββ | 254/388 [2:18:32<1:10:46, 31.69s/it] 66%|βββββββ | 255/388 [2:19:05<1:10:46, 31.93s/it] 66%|βββββββ | 256/388 [2:19:36<1:09:58, 31.81s/it] 66%|βββββββ | 257/388 [2:20:09<1:09:56, 32.03s/it] 66%|βββββββ | 258/388 [2:20:41<1:09:12, 31.94s/it] 67%|βββββββ | 259/388 [2:21:13<1:09:09, 32.17s/it] 67%|βββββββ | 260/388 [2:21:45<1:08:07, 31.93s/it] 67%|βββββββ | 260/388 [2:21:45<1:08:07, 31.93s/it] 67%|βββββββ | 261/388 [2:22:17<1:08:09, 32.20s/it] 68%|βββββββ | 262/388 [2:22:50<1:07:57, 32.36s/it] 68%|βββββββ | 263/388 [2:23:23<1:07:39, 32.48s/it] 68%|βββββββ | 264/388 [2:23:54<1:06:20, 32.10s/it] 68%|βββββββ | 265/388 [2:24:24<1:04:43, 31.57s/it] 69%|βββββββ | 266/388 [2:24:57<1:04:54, 31.92s/it] 69%|βββββββ | 267/388 [2:25:30<1:04:50, 32.15s/it] 69%|βββββββ | 268/388 [2:26:02<1:04:31, 32.26s/it] 69%|βββββββ | 269/388 [2:26:34<1:03:38, 32.09s/it] 70%|βββββββ | 270/388 [2:27:07<1:03:35, 32.33s/it] 70%|βββββββ | 270/388 [2:27:07<1:03:35, 32.33s/it] 70%|βββββββ | 271/388 [2:27:39<1:02:56, 32.28s/it] 70%|βββββββ | 272/388 [2:28:10<1:01:47, 31.96s/it] 70%|βββββββ | 273/388 [2:28:43<1:01:39, 32.17s/it] 71%|βββββββ | 274/388 [2:29:13<59:53, 31.52s/it] 71%|βββββββ | 275/388 [2:29:46<1:00:03, 31.89s/it] 71%|βββββββ | 276/388 [2:30:19<1:00:03, 32.17s/it] 71%|ββββββββ | 277/388 [2:30:50<59:15, 32.03s/it] 72%|ββββββββ | 278/388 [2:31:23<58:52, 32.11s/it] 72%|ββββββββ | 279/388 [2:31:55<58:40, 32.30s/it] 72%|ββββββββ | 280/388 [2:32:28<58:24, 32.45s/it] 72%|ββββββββ | 280/388 [2:32:28<58:24, 32.45s/it] 72%|ββββββββ | 281/388 [2:33:01<57:48, 32.42s/it] 73%|ββββββββ | 282/388 [2:33:33<57:25, 32.50s/it] 73%|ββββββββ | 283/388 [2:34:06<57:03, 32.60s/it] 73%|ββββββββ | 284/388 [2:34:39<56:30, 32.60s/it] 73%|ββββββββ | 285/388 [2:35:11<56:02, 32.64s/it] 74%|ββββββββ | 286/388 [2:35:44<55:13, 32.49s/it] 74%|ββββββββ | 287/388 [2:36:16<54:37, 32.45s/it] 74%|ββββββββ | 288/388 [2:36:48<53:41, 32.21s/it] 74%|ββββββββ | 289/388 [2:37:19<52:39, 31.91s/it] 75%|ββββββββ | 290/388 [2:37:52<52:32, 32.17s/it] 75%|ββββββββ | 290/388 [2:37:52<52:32, 32.17s/it] 75%|ββββββββ | 291/388 [2:38:24<52:19, 32.37s/it] 75%|ββββββββ | 292/388 [2:38:55<51:03, 31.91s/it] 76%|ββββββββ | 293/388 [2:39:26<50:07, 31.66s/it] 76%|ββββββββ | 294/388 [2:39:59<50:07, 31.99s/it] 76%|ββββββββ | 295/388 [2:40:30<48:52, 31.54s/it] 76%|ββββββββ | 296/388 [2:41:01<48:25, 31.58s/it] 77%|ββββββββ | 297/388 [2:41:34<48:28, 31.97s/it] 77%|ββββββββ | 298/388 [2:42:07<48:17, 32.19s/it] 77%|ββββββββ | 299/388 [2:42:40<48:00, 32.37s/it] 77%|ββββββββ | 300/388 [2:43:12<47:37, 32.47s/it] 77%|ββββββββ | 300/388 [2:43:12<47:37, 32.47s/it] 78%|ββββββββ | 301/388 [2:43:45<47:10, 32.54s/it] 78%|ββββββββ | 302/388 [2:44:15<45:36, 31.81s/it] 78%|ββββββββ | 303/388 [2:44:48<45:27, 32.09s/it] 78%|ββββββββ | 304/388 [2:45:20<44:53, 32.07s/it] 79%|ββββββββ | 305/388 [2:45:52<44:22, 32.08s/it] 79%|ββββββββ | 306/388 [2:46:24<43:48, 32.06s/it] 79%|ββββββββ | 307/388 [2:46:55<42:56, 31.81s/it] 79%|ββββββββ | 308/388 [2:47:27<42:14, 31.68s/it] 80%|ββββββββ | 309/388 [2:47:59<42:08, 32.00s/it] 80%|ββββββββ | 310/388 [2:48:31<41:32, 31.96s/it] 80%|ββββββββ | 310/388 [2:48:31<41:32, 31.96s/it] 80%|ββββββββ | 311/388 [2:49:03<41:04, 32.01s/it] 80%|ββββββββ | 312/388 [2:49:36<40:49, 32.23s/it] 81%|ββββββββ | 313/388 [2:50:09<40:28, 32.38s/it] 81%|ββββββββ | 314/388 [2:50:41<40:02, 32.46s/it] 81%|ββββββββ | 315/388 [2:51:13<39:15, 32.26s/it] 81%|βββββββββ | 316/388 [2:51:46<38:53, 32.40s/it] 82%|βββββββββ | 317/388 [2:52:19<38:28, 32.51s/it] 82%|βββββββββ | 318/388 [2:52:50<37:38, 32.27s/it] 82%|βββββββββ | 319/388 [2:53:23<37:20, 32.46s/it] 82%|βββββββββ | 320/388 [2:53:56<36:55, 32.59s/it] 82%|βββββββββ | 320/388 [2:53:56<36:55, 32.59s/it] 83%|βββββββββ | 321/388 [2:54:29<36:24, 32.61s/it] 83%|βββββββββ | 322/388 [2:55:00<35:12, 32.01s/it] 83%|βββββββββ | 323/388 [2:55:31<34:34, 31.91s/it] 84%|βββββββββ | 324/388 [2:56:01<33:28, 31.38s/it] 84%|βββββββββ | 325/388 [2:56:34<33:23, 31.80s/it] 84%|βββββββββ | 326/388 [2:57:06<32:49, 31.77s/it] 84%|βββββββββ | 327/388 [2:57:37<32:03, 31.53s/it] 85%|βββββββββ | 328/388 [2:58:07<31:15, 31.25s/it] 85%|βββββββββ | 329/388 [2:58:40<31:08, 31.67s/it] 85%|βββββββββ | 330/388 [2:59:13<30:58, 32.04s/it] 85%|βββββββββ | 330/388 [2:59:13<30:58, 32.04s/it] 85%|βββββββββ | 331/388 [2:59:46<30:38, 32.25s/it] 86%|βββββββββ | 332/388 [3:00:18<30:10, 32.33s/it] 86%|βββββββββ | 333/388 [3:00:51<29:40, 32.37s/it] 86%|βββββββββ | 334/388 [3:01:23<29:09, 32.39s/it] 86%|βββββββββ | 335/388 [3:01:56<28:42, 32.49s/it] 87%|βββββββββ | 336/388 [3:02:27<27:53, 32.18s/it] 87%|βββββββββ | 337/388 [3:03:00<27:29, 32.35s/it] 87%|βββββββββ | 338/388 [3:03:33<27:01, 32.42s/it] 87%|βββββββββ | 339/388 [3:04:01<25:31, 31.26s/it] 88%|βββββββββ | 340/388 [3:04:33<25:01, 31.29s/it] 88%|βββββββββ | 340/388 [3:04:33<25:01, 31.29s/it] 88%|βββββββββ | 341/388 [3:05:04<24:39, 31.48s/it] 88%|βββββββββ | 342/388 [3:05:37<24:26, 31.88s/it] 88%|βββββββββ | 343/388 [3:06:09<23:58, 31.96s/it] 89%|βββββββββ | 344/388 [3:06:42<23:36, 32.19s/it] 89%|βββββββββ | 345/388 [3:07:14<23:03, 32.18s/it] 89%|βββββββββ | 346/388 [3:07:47<22:38, 32.33s/it] 89%|βββββββββ | 347/388 [3:08:18<21:50, 31.95s/it] 90%|βββββββββ | 348/388 [3:08:51<21:28, 32.20s/it] 90%|βββββββββ | 349/388 [3:09:22<20:40, 31.80s/it] 90%|βββββββββ | 350/388 [3:09:54<20:19, 32.08s/it] 90%|βββββββββ | 350/388 [3:09:54<20:19, 32.08s/it] 90%|βββββββββ | 351/388 [3:10:26<19:36, 31.81s/it] 91%|βββββββββ | 352/388 [3:10:58<19:15, 32.10s/it] 91%|βββββββββ | 353/388 [3:11:31<18:51, 32.32s/it] 91%|βββββββββ | 354/388 [3:12:04<18:21, 32.40s/it] 91%|ββββββββββ| 355/388 [3:12:34<17:24, 31.65s/it] 92%|ββββββββββ| 356/388 [3:13:07<17:04, 32.01s/it] 92%|ββββββββββ| 357/388 [3:13:38<16:29, 31.92s/it] 92%|ββββββββββ| 358/388 [3:14:10<15:58, 31.94s/it] 93%|ββββββββββ| 359/388 [3:14:43<15:32, 32.17s/it] 93%|ββββββββββ| 360/388 [3:15:13<14:43, 31.56s/it] 93%|ββββββββββ| 360/388 [3:15:13<14:43, 31.56s/it] 93%|ββββββββββ| 361/388 [3:15:46<14:21, 31.92s/it] 93%|ββββββββββ| 362/388 [3:16:19<13:57, 32.21s/it] 94%|ββββββββββ| 363/388 [3:16:51<13:29, 32.36s/it] 94%|ββββββββββ| 364/388 [3:17:24<12:59, 32.48s/it] 94%|ββββββββββ| 365/388 [3:17:57<12:27, 32.49s/it] 94%|ββββββββββ| 366/388 [3:18:29<11:55, 32.55s/it] 95%|ββββββββββ| 367/388 [3:19:01<11:19, 32.34s/it] 95%|ββββββββββ| 368/388 [3:19:32<10:39, 31.98s/it] 95%|ββββββββββ| 369/388 [3:20:05<10:10, 32.11s/it] 95%|ββββββββββ| 370/388 [3:20:37<09:38, 32.16s/it] 95%|ββββββββββ| 370/388 [3:20:37<09:38, 32.16s/it] 96%|ββββββββββ| 371/388 [3:21:10<09:08, 32.27s/it] 96%|ββββββββββ| 372/388 [3:21:42<08:37, 32.32s/it] 96%|ββββββββββ| 373/388 [3:22:14<08:04, 32.29s/it] 96%|ββββββββββ| 374/388 [3:22:44<07:22, 31.58s/it] 97%|ββββββββββ| 375/388 [3:23:15<06:48, 31.43s/it] 97%|ββββββββββ| 376/388 [3:23:48<06:21, 31.76s/it] 97%|ββββββββββ| 377/388 [3:24:19<05:46, 31.54s/it] 97%|ββββββββββ| 378/388 [3:24:52<05:19, 31.91s/it] 98%|ββββββββββ| 379/388 [3:25:24<04:48, 32.09s/it] 98%|ββββββββββ| 380/388 [3:25:57<04:18, 32.31s/it] 98%|ββββββββββ| 380/388 [3:25:57<04:18, 32.31s/it] 98%|ββββββββββ| 381/388 [3:26:29<03:46, 32.37s/it] 98%|ββββββββββ| 382/388 [3:27:02<03:14, 32.46s/it] 99%|ββββββββββ| 383/388 [3:27:33<02:39, 31.94s/it] 99%|ββββββββββ| 384/388 [3:28:06<02:08, 32.19s/it] 99%|ββββββββββ| 385/388 [3:28:37<01:36, 32.07s/it] 99%|ββββββββββ| 386/388 [3:29:10<01:04, 32.27s/it] 100%|ββββββββββ| 387/388 [3:29:42<00:32, 32.09s/it] 100%|ββββββββββ| 388/388 [3:30:15<00:00, 32.27s/it]There were missing keys in the checkpoint model loaded: ['base_model.model.model.embed_tokens.weight', 'base_model.model.model.layers.0.self_attn.q_proj.weight', 'base_model.model.model.layers.0.self_attn.k_proj.weight', 'base_model.model.model.layers.0.self_attn.v_proj.weight', 'base_model.model.model.layers.0.self_attn.o_proj.weight', 'base_model.model.model.layers.0.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.0.mlp.gate_proj.weight', 'base_model.model.model.layers.0.mlp.down_proj.weight', 'base_model.model.model.layers.0.mlp.up_proj.weight', 'base_model.model.model.layers.0.input_layernorm.weight', 'base_model.model.model.layers.0.post_attention_layernorm.weight', 'base_model.model.model.layers.1.self_attn.q_proj.weight', 'base_model.model.model.layers.1.self_attn.k_proj.weight', 'base_model.model.model.layers.1.self_attn.v_proj.weight', 'base_model.model.model.layers.1.self_attn.o_proj.weight', 'base_model.model.model.layers.1.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.1.mlp.gate_proj.weight', 'base_model.model.model.layers.1.mlp.down_proj.weight', 'base_model.model.model.layers.1.mlp.up_proj.weight', 'base_model.model.model.layers.1.input_layernorm.weight', 'base_model.model.model.layers.1.post_attention_layernorm.weight', 'base_model.model.model.layers.2.self_attn.q_proj.weight', 'base_model.model.model.layers.2.self_attn.k_proj.weight', 'base_model.model.model.layers.2.self_attn.v_proj.weight', 'base_model.model.model.layers.2.self_attn.o_proj.weight', 'base_model.model.model.layers.2.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.2.mlp.gate_proj.weight', 'base_model.model.model.layers.2.mlp.down_proj.weight', 'base_model.model.model.layers.2.mlp.up_proj.weight', 'base_model.model.model.layers.2.input_layernorm.weight', 'base_model.model.model.layers.2.post_attention_layernorm.weight', 'base_model.model.model.layers.3.self_attn.q_proj.weight', 'base_model.model.model.layers.3.self_attn.k_proj.weight', 'base_model.model.model.layers.3.self_attn.v_proj.weight', 'base_model.model.model.layers.3.self_attn.o_proj.weight', 'base_model.model.model.layers.3.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.3.mlp.gate_proj.weight', 'base_model.model.model.layers.3.mlp.down_proj.weight', 'base_model.model.model.layers.3.mlp.up_proj.weight', 'base_model.model.model.layers.3.input_layernorm.weight', 'base_model.model.model.layers.3.post_attention_layernorm.weight', 'base_model.model.model.layers.4.self_attn.q_proj.weight', 'base_model.model.model.layers.4.self_attn.k_proj.weight', 'base_model.model.model.layers.4.self_attn.v_proj.weight', 'base_model.model.model.layers.4.self_attn.o_proj.weight', 'base_model.model.model.layers.4.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.4.mlp.gate_proj.weight', 'base_model.model.model.layers.4.mlp.down_proj.weight', 'base_model.model.model.layers.4.mlp.up_proj.weight', 'base_model.model.model.layers.4.input_layernorm.weight', 'base_model.model.model.layers.4.post_attention_layernorm.weight', 'base_model.model.model.layers.5.self_attn.q_proj.weight', 'base_model.model.model.layers.5.self_attn.k_proj.weight', 'base_model.model.model.layers.5.self_attn.v_proj.weight', 'base_model.model.model.layers.5.self_attn.o_proj.weight', 'base_model.model.model.layers.5.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.5.mlp.gate_proj.weight', 'base_model.model.model.layers.5.mlp.down_proj.weight', 'base_model.model.model.layers.5.mlp.up_proj.weight', 'base_model.model.model.layers.5.input_layernorm.weight', 'base_model.model.model.layers.5.post_attention_layernorm.weight', 'base_model.model.model.layers.6.self_attn.q_proj.weight', 'base_model.model.model.layers.6.self_attn.k_proj.weight', 'base_model.model.model.layers.6.self_attn.v_proj.weight', 'base_model.model.model.layers.6.self_attn.o_proj.weight', 'base_model.model.model.layers.6.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.6.mlp.gate_proj.weight', 'base_model.model.model.layers.6.mlp.down_proj.weight', 'base_model.model.model.layers.6.mlp.up_proj.weight', 'base_model.model.model.layers.6.input_layernorm.weight', 'base_model.model.model.layers.6.post_attention_layernorm.weight', 'base_model.model.model.layers.7.self_attn.q_proj.weight', 'base_model.model.model.layers.7.self_attn.k_proj.weight', 'base_model.model.model.layers.7.self_attn.v_proj.weight', 'base_model.model.model.layers.7.self_attn.o_proj.weight', 'base_model.model.model.layers.7.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.7.mlp.gate_proj.weight', 'base_model.model.model.layers.7.mlp.down_proj.weight', 'base_model.model.model.layers.7.mlp.up_proj.weight', 'base_model.model.model.layers.7.input_layernorm.weight', 'base_model.model.model.layers.7.post_attention_layernorm.weight', 'base_model.model.model.layers.8.self_attn.q_proj.weight', 'base_model.model.model.layers.8.self_attn.k_proj.weight', 'base_model.model.model.layers.8.self_attn.v_proj.weight', 'base_model.model.model.layers.8.self_attn.o_proj.weight', 'base_model.model.model.layers.8.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.8.mlp.gate_proj.weight', 'base_model.model.model.layers.8.mlp.down_proj.weight', 'base_model.model.model.layers.8.mlp.up_proj.weight', 'base_model.model.model.layers.8.input_layernorm.weight', 'base_model.model.model.layers.8.post_attention_layernorm.weight', 'base_model.model.model.layers.9.self_attn.q_proj.weight', 'base_model.model.model.layers.9.self_attn.k_proj.weight', 'base_model.model.model.layers.9.self_attn.v_proj.weight', 'base_model.model.model.layers.9.self_attn.o_proj.weight', 'base_model.model.model.layers.9.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.9.mlp.gate_proj.weight', 'base_model.model.model.layers.9.mlp.down_proj.weight', 'base_model.model.model.layers.9.mlp.up_proj.weight', 'base_model.model.model.layers.9.input_layernorm.weight', 'base_model.model.model.layers.9.post_attention_layernorm.weight', 'base_model.model.model.layers.10.self_attn.q_proj.weight', 'base_model.model.model.layers.10.self_attn.k_proj.weight', 'base_model.model.model.layers.10.self_attn.v_proj.weight', 'base_model.model.model.layers.10.self_attn.o_proj.weight', 'base_model.model.model.layers.10.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.10.mlp.gate_proj.weight', 'base_model.model.model.layers.10.mlp.down_proj.weight', 'base_model.model.model.layers.10.mlp.up_proj.weight', 'base_model.model.model.layers.10.input_layernorm.weight', 'base_model.model.model.layers.10.post_attention_layernorm.weight', 'base_model.model.model.layers.11.self_attn.q_proj.weight', 'base_model.model.model.layers.11.self_attn.k_proj.weight', 'base_model.model.model.layers.11.self_attn.v_proj.weight', 'base_model.model.model.layers.11.self_attn.o_proj.weight', 'base_model.model.model.layers.11.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.11.mlp.gate_proj.weight', 'base_model.model.model.layers.11.mlp.down_proj.weight', 'base_model.model.model.layers.11.mlp.up_proj.weight', 'base_model.model.model.layers.11.input_layernorm.weight', 'base_model.model.model.layers.11.post_attention_layernorm.weight', 'base_model.model.model.layers.12.self_attn.q_proj.weight', 'base_model.model.model.layers.12.self_attn.k_proj.weight', 'base_model.model.model.layers.12.self_attn.v_proj.weight', 'base_model.model.model.layers.12.self_attn.o_proj.weight', 'base_model.model.model.layers.12.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.12.mlp.gate_proj.weight', 'base_model.model.model.layers.12.mlp.down_proj.weight', 'base_model.model.model.layers.12.mlp.up_proj.weight', 'base_model.model.model.layers.12.input_layernorm.weight', 'base_model.model.model.layers.12.post_attention_layernorm.weight', 'base_model.model.model.layers.13.self_attn.q_proj.weight', 'base_model.model.model.layers.13.self_attn.k_proj.weight', 'base_model.model.model.layers.13.self_attn.v_proj.weight', 'base_model.model.model.layers.13.self_attn.o_proj.weight', 'base_model.model.model.layers.13.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.13.mlp.gate_proj.weight', 'base_model.model.model.layers.13.mlp.down_proj.weight', 'base_model.model.model.layers.13.mlp.up_proj.weight', 'base_model.model.model.layers.13.input_layernorm.weight', 'base_model.model.model.layers.13.post_attention_layernorm.weight', 'base_model.model.model.layers.14.self_attn.q_proj.weight', 'base_model.model.model.layers.14.self_attn.k_proj.weight', 'base_model.model.model.layers.14.self_attn.v_proj.weight', 'base_model.model.model.layers.14.self_attn.o_proj.weight', 'base_model.model.model.layers.14.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.14.mlp.gate_proj.weight', 'base_model.model.model.layers.14.mlp.down_proj.weight', 'base_model.model.model.layers.14.mlp.up_proj.weight', 'base_model.model.model.layers.14.input_layernorm.weight', 'base_model.model.model.layers.14.post_attention_layernorm.weight', 'base_model.model.model.layers.15.self_attn.q_proj.weight', 'base_model.model.model.layers.15.self_attn.k_proj.weight', 'base_model.model.model.layers.15.self_attn.v_proj.weight', 'base_model.model.model.layers.15.self_attn.o_proj.weight', 'base_model.model.model.layers.15.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.15.mlp.gate_proj.weight', 'base_model.model.model.layers.15.mlp.down_proj.weight', 'base_model.model.model.layers.15.mlp.up_proj.weight', 'base_model.model.model.layers.15.input_layernorm.weight', 'base_model.model.model.layers.15.post_attention_layernorm.weight', 'base_model.model.model.layers.16.self_attn.q_proj.weight', 'base_model.model.model.layers.16.self_attn.k_proj.weight', 'base_model.model.model.layers.16.self_attn.v_proj.weight', 'base_model.model.model.layers.16.self_attn.o_proj.weight', 'base_model.model.model.layers.16.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.16.mlp.gate_proj.weight', 'base_model.model.model.layers.16.mlp.down_proj.weight', 'base_model.model.model.layers.16.mlp.up_proj.weight', 'base_model.model.model.layers.16.input_layernorm.weight', 'base_model.model.model.layers.16.post_attention_layernorm.weight', 'base_model.model.model.layers.17.self_attn.q_proj.weight', 'base_model.model.model.layers.17.self_attn.k_proj.weight', 'base_model.model.model.layers.17.self_attn.v_proj.weight', 'base_model.model.model.layers.17.self_attn.o_proj.weight', 'base_model.model.model.layers.17.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.17.mlp.gate_proj.weight', 'base_model.model.model.layers.17.mlp.down_proj.weight', 'base_model.model.model.layers.17.mlp.up_proj.weight', 'base_model.model.model.layers.17.input_layernorm.weight', 'base_model.model.model.layers.17.post_attention_layernorm.weight', 'base_model.model.model.layers.18.self_attn.q_proj.weight', 'base_model.model.model.layers.18.self_attn.k_proj.weight', 'base_model.model.model.layers.18.self_attn.v_proj.weight', 'base_model.model.model.layers.18.self_attn.o_proj.weight', 'base_model.model.model.layers.18.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.18.mlp.gate_proj.weight', 'base_model.model.model.layers.18.mlp.down_proj.weight', 'base_model.model.model.layers.18.mlp.up_proj.weight', 'base_model.model.model.layers.18.input_layernorm.weight', 'base_model.model.model.layers.18.post_attention_layernorm.weight', 'base_model.model.model.layers.19.self_attn.q_proj.weight', 'base_model.model.model.layers.19.self_attn.k_proj.weight', 'base_model.model.model.layers.19.self_attn.v_proj.weight', 'base_model.model.model.layers.19.self_attn.o_proj.weight', 'base_model.model.model.layers.19.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.19.mlp.gate_proj.weight', 'base_model.model.model.layers.19.mlp.down_proj.weight', 'base_model.model.model.layers.19.mlp.up_proj.weight', 'base_model.model.model.layers.19.input_layernorm.weight', 'base_model.model.model.layers.19.post_attention_layernorm.weight', 'base_model.model.model.layers.20.self_attn.q_proj.weight', 'base_model.model.model.layers.20.self_attn.k_proj.weight', 'base_model.model.model.layers.20.self_attn.v_proj.weight', 'base_model.model.model.layers.20.self_attn.o_proj.weight', 'base_model.model.model.layers.20.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.20.mlp.gate_proj.weight', 'base_model.model.model.layers.20.mlp.down_proj.weight', 'base_model.model.model.layers.20.mlp.up_proj.weight', 'base_model.model.model.layers.20.input_layernorm.weight', 'base_model.model.model.layers.20.post_attention_layernorm.weight', 'base_model.model.model.layers.21.self_attn.q_proj.weight', 'base_model.model.model.layers.21.self_attn.k_proj.weight', 'base_model.model.model.layers.21.self_attn.v_proj.weight', 'base_model.model.model.layers.21.self_attn.o_proj.weight', 'base_model.model.model.layers.21.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.21.mlp.gate_proj.weight', 'base_model.model.model.layers.21.mlp.down_proj.weight', 'base_model.model.model.layers.21.mlp.up_proj.weight', 'base_model.model.model.layers.21.input_layernorm.weight', 'base_model.model.model.layers.21.post_attention_layernorm.weight', 'base_model.model.model.layers.22.self_attn.q_proj.weight', 'base_model.model.model.layers.22.self_attn.k_proj.weight', 'base_model.model.model.layers.22.self_attn.v_proj.weight', 'base_model.model.model.layers.22.self_attn.o_proj.weight', 'base_model.model.model.layers.22.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.22.mlp.gate_proj.weight', 'base_model.model.model.layers.22.mlp.down_proj.weight', 'base_model.model.model.layers.22.mlp.up_proj.weight', 'base_model.model.model.layers.22.input_layernorm.weight', 'base_model.model.model.layers.22.post_attention_layernorm.weight', 'base_model.model.model.layers.23.self_attn.q_proj.weight', 'base_model.model.model.layers.23.self_attn.k_proj.weight', 'base_model.model.model.layers.23.self_attn.v_proj.weight', 'base_model.model.model.layers.23.self_attn.o_proj.weight', 'base_model.model.model.layers.23.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.23.mlp.gate_proj.weight', 'base_model.model.model.layers.23.mlp.down_proj.weight', 'base_model.model.model.layers.23.mlp.up_proj.weight', 'base_model.model.model.layers.23.input_layernorm.weight', 'base_model.model.model.layers.23.post_attention_layernorm.weight', 'base_model.model.model.layers.24.self_attn.q_proj.weight', 'base_model.model.model.layers.24.self_attn.k_proj.weight', 'base_model.model.model.layers.24.self_attn.v_proj.weight', 'base_model.model.model.layers.24.self_attn.o_proj.weight', 'base_model.model.model.layers.24.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.24.mlp.gate_proj.weight', 'base_model.model.model.layers.24.mlp.down_proj.weight', 'base_model.model.model.layers.24.mlp.up_proj.weight', 'base_model.model.model.layers.24.input_layernorm.weight', 'base_model.model.model.layers.24.post_attention_layernorm.weight', 'base_model.model.model.layers.25.self_attn.q_proj.weight', 'base_model.model.model.layers.25.self_attn.k_proj.weight', 'base_model.model.model.layers.25.self_attn.v_proj.weight', 'base_model.model.model.layers.25.self_attn.o_proj.weight', 'base_model.model.model.layers.25.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.25.mlp.gate_proj.weight', 'base_model.model.model.layers.25.mlp.down_proj.weight', 'base_model.model.model.layers.25.mlp.up_proj.weight', 'base_model.model.model.layers.25.input_layernorm.weight', 'base_model.model.model.layers.25.post_attention_layernorm.weight', 'base_model.model.model.layers.26.self_attn.q_proj.weight', 'base_model.model.model.layers.26.self_attn.k_proj.weight', 'base_model.model.model.layers.26.self_attn.v_proj.weight', 'base_model.model.model.layers.26.self_attn.o_proj.weight', 'base_model.model.model.layers.26.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.26.mlp.gate_proj.weight', 'base_model.model.model.layers.26.mlp.down_proj.weight', 'base_model.model.model.layers.26.mlp.up_proj.weight', 'base_model.model.model.layers.26.input_layernorm.weight', 'base_model.model.model.layers.26.post_attention_layernorm.weight', 'base_model.model.model.layers.27.self_attn.q_proj.weight', 'base_model.model.model.layers.27.self_attn.k_proj.weight', 'base_model.model.model.layers.27.self_attn.v_proj.weight', 'base_model.model.model.layers.27.self_attn.o_proj.weight', 'base_model.model.model.layers.27.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.27.mlp.gate_proj.weight', 'base_model.model.model.layers.27.mlp.down_proj.weight', 'base_model.model.model.layers.27.mlp.up_proj.weight', 'base_model.model.model.layers.27.input_layernorm.weight', 'base_model.model.model.layers.27.post_attention_layernorm.weight', 'base_model.model.model.layers.28.self_attn.q_proj.weight', 'base_model.model.model.layers.28.self_attn.k_proj.weight', 'base_model.model.model.layers.28.self_attn.v_proj.weight', 'base_model.model.model.layers.28.self_attn.o_proj.weight', 'base_model.model.model.layers.28.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.28.mlp.gate_proj.weight', 'base_model.model.model.layers.28.mlp.down_proj.weight', 'base_model.model.model.layers.28.mlp.up_proj.weight', 'base_model.model.model.layers.28.input_layernorm.weight', 'base_model.model.model.layers.28.post_attention_layernorm.weight', 'base_model.model.model.layers.29.self_attn.q_proj.weight', 'base_model.model.model.layers.29.self_attn.k_proj.weight', 'base_model.model.model.layers.29.self_attn.v_proj.weight', 'base_model.model.model.layers.29.self_attn.o_proj.weight', 'base_model.model.model.layers.29.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.29.mlp.gate_proj.weight', 'base_model.model.model.layers.29.mlp.down_proj.weight', 'base_model.model.model.layers.29.mlp.up_proj.weight', 'base_model.model.model.layers.29.input_layernorm.weight', 'base_model.model.model.layers.29.post_attention_layernorm.weight', 'base_model.model.model.layers.30.self_attn.q_proj.weight', 'base_model.model.model.layers.30.self_attn.k_proj.weight', 'base_model.model.model.layers.30.self_attn.v_proj.weight', 'base_model.model.model.layers.30.self_attn.o_proj.weight', 'base_model.model.model.layers.30.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.30.mlp.gate_proj.weight', 'base_model.model.model.layers.30.mlp.down_proj.weight', 'base_model.model.model.layers.30.mlp.up_proj.weight', 'base_model.model.model.layers.30.input_layernorm.weight', 'base_model.model.model.layers.30.post_attention_layernorm.weight', 'base_model.model.model.layers.31.self_attn.q_proj.weight', 'base_model.model.model.layers.31.self_attn.k_proj.weight', 'base_model.model.model.layers.31.self_attn.v_proj.weight', 'base_model.model.model.layers.31.self_attn.o_proj.weight', 'base_model.model.model.layers.31.self_attn.rotary_emb.inv_freq', 'base_model.model.model.layers.31.mlp.gate_proj.weight', 'base_model.model.model.layers.31.mlp.down_proj.weight', 'base_model.model.model.layers.31.mlp.up_proj.weight', 'base_model.model.model.layers.31.input_layernorm.weight', 'base_model.model.model.layers.31.post_attention_layernorm.weight', 'base_model.model.model.norm.weight', 'base_model.model.lm_head.0.weight']. | |
100%|ββββββββββ| 388/388 [3:30:15<00:00, 32.27s/it] 100%|ββββββββββ| 388/388 [3:30:15<00:00, 32.51s/it] | |
{'eval_loss': 0.8030890822410583, 'eval_runtime': 134.5586, 'eval_samples_per_second': 14.863, 'eval_steps_per_second': 1.858, 'epoch': 0.51} | |
{'loss': 0.8163, 'learning_rate': 0.00018645833333333332, 'epoch': 0.54} | |
{'loss': 0.8108, 'learning_rate': 0.00017604166666666666, 'epoch': 0.57} | |
{'loss': 0.8037, 'learning_rate': 0.000165625, 'epoch': 0.59} | |
{'loss': 0.8034, 'learning_rate': 0.00015520833333333334, 'epoch': 0.62} | |
{'loss': 0.8068, 'learning_rate': 0.00014479166666666666, 'epoch': 0.64} | |
{'loss': 0.7987, 'learning_rate': 0.000134375, 'epoch': 0.67} | |
{'loss': 0.8053, 'learning_rate': 0.00012395833333333332, 'epoch': 0.69} | |
{'loss': 0.8025, 'learning_rate': 0.00011354166666666666, 'epoch': 0.72} | |
{'loss': 0.8061, 'learning_rate': 0.00010312499999999999, 'epoch': 0.75} | |
{'loss': 0.8152, 'learning_rate': 9.270833333333333e-05, 'epoch': 0.77} | |
{'loss': 0.7909, 'learning_rate': 8.229166666666667e-05, 'epoch': 0.8} | |
{'loss': 0.8092, 'learning_rate': 7.1875e-05, 'epoch': 0.82} | |
{'loss': 0.7899, 'learning_rate': 6.145833333333333e-05, 'epoch': 0.85} | |
{'loss': 0.7995, 'learning_rate': 5.104166666666666e-05, 'epoch': 0.87} | |
{'loss': 0.7884, 'learning_rate': 4.062499999999999e-05, 'epoch': 0.9} | |
{'loss': 0.8031, 'learning_rate': 3.020833333333333e-05, 'epoch': 0.93} | |
{'loss': 0.8085, 'learning_rate': 1.9791666666666665e-05, 'epoch': 0.95} | |
{'loss': 0.8, 'learning_rate': 9.375e-06, 'epoch': 0.98} | |
{'train_runtime': 12615.0261, 'train_samples_per_second': 3.944, 'train_steps_per_second': 0.031, 'train_loss': 0.9600355195015976, 'epoch': 1.0} | |
If there's a warning about missing keys above, please disregard :) | |