diff --git "a/logs.txt" "b/logs.txt"
new file mode 100644--- /dev/null
+++ "b/logs.txt"
@@ -0,0 +1,601 @@
+/Users/cfruan/miniconda3/envs/mlc-chat-venv/bin/python -m mlc_llm gen_config /Users/Shared/models/Meta-Llama-3.1-70B-Instruct --quantization q0f16 --conv-template llama-3_1 --output local_dir/Llama-3.1-70B-Instruct-q0f16-MLC
+[2024-07-23 17:43:51] INFO auto_config.py:116: [92mFound[0m model configuration: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/config.json
+[2024-07-23 17:43:51] INFO auto_config.py:154: [92mFound[0m model type: [1mllama[0m. Use `--model-type` to override.
+[2024-07-23 17:43:51] INFO llama_model.py:62: [1mcontext_window_size[0m not found in config.json. Falling back to [1mmax_position_embeddings[0m (131072)
+[2024-07-23 17:43:51] INFO llama_model.py:82: [1mprefill_chunk_size[0m defaults to 2048
+[2024-07-23 17:43:51] INFO config.py:107: Overriding [1mmax_batch_size[0m from 1 to 80
+[2024-07-23 17:43:51] INFO gen_config.py:144: [generation_config.json] Setting [1mbos_token_id[0m: 128000
+[2024-07-23 17:43:51] INFO gen_config.py:144: [generation_config.json] Setting [1meos_token_id[0m: [128001, 128008, 128009]
+[2024-07-23 17:43:51] INFO gen_config.py:144: [generation_config.json] Setting [1mtemperature[0m: 0.6
+[2024-07-23 17:43:51] INFO gen_config.py:144: [generation_config.json] Setting [1mtop_p[0m: 0.9
+[2024-07-23 17:43:51] INFO gen_config.py:158: [91mNot found[0m tokenizer config: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/tokenizer.model
+[2024-07-23 17:43:51] INFO gen_config.py:156: [92mFound[0m tokenizer config: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/tokenizer.json. Copying to [1mlocal_dir/Llama-3.1-70B-Instruct-q0f16-MLC/tokenizer.json[0m
+[2024-07-23 17:43:51] INFO gen_config.py:158: [91mNot found[0m tokenizer config: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/vocab.json
+[2024-07-23 17:43:51] INFO gen_config.py:158: [91mNot found[0m tokenizer config: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/merges.txt
+[2024-07-23 17:43:51] INFO gen_config.py:158: [91mNot found[0m tokenizer config: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/added_tokens.json
+[2024-07-23 17:43:51] INFO gen_config.py:156: [92mFound[0m tokenizer config: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/tokenizer_config.json. Copying to [1mlocal_dir/Llama-3.1-70B-Instruct-q0f16-MLC/tokenizer_config.json[0m
+[2024-07-23 17:43:51] INFO gen_config.py:217: Detected tokenizer info: {'token_postproc_method': 'byte_level', 'prepend_space_in_encode': False, 'strip_space_in_decode': False}
+[2024-07-23 17:43:51] INFO gen_config.py:32: [System default] Setting [1mpad_token_id[0m: 0
+[2024-07-23 17:43:51] INFO gen_config.py:32: [System default] Setting [1mpresence_penalty[0m: 0.0
+[2024-07-23 17:43:51] INFO gen_config.py:32: [System default] Setting [1mfrequency_penalty[0m: 0.0
+[2024-07-23 17:43:51] INFO gen_config.py:32: [System default] Setting [1mrepetition_penalty[0m: 1.0
+[2024-07-23 17:43:51] INFO gen_config.py:245: Dumping configuration file to: [1mlocal_dir/Llama-3.1-70B-Instruct-q0f16-MLC/mlc-chat-config.json[0m
+/Users/cfruan/miniconda3/envs/mlc-chat-venv/bin/python -m mlc_llm convert_weight /Users/Shared/models/Meta-Llama-3.1-70B-Instruct --quantization q0f16 --output local_dir/Llama-3.1-70B-Instruct-q0f16-MLC
+[2024-07-23 17:43:52] INFO auto_config.py:116: [92mFound[0m model configuration: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/config.json
+[2024-07-23 17:43:52] INFO auto_device.py:88: [91mNot found[0m device: cuda:0
+[2024-07-23 17:43:53] INFO auto_device.py:88: [91mNot found[0m device: rocm:0
+[2024-07-23 17:43:54] INFO auto_device.py:79: [92mFound[0m device: metal:0
+[2024-07-23 17:43:55] INFO auto_device.py:88: [91mNot found[0m device: vulkan:0
+[2024-07-23 17:43:55] INFO auto_device.py:88: [91mNot found[0m device: opencl:0
+[2024-07-23 17:43:55] INFO auto_device.py:35: Using device: [1mmetal:0[0m
+[2024-07-23 17:43:55] INFO auto_weight.py:71: Finding weights in: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct
+[2024-07-23 17:43:55] INFO auto_weight.py:137: [91mNot found[0m Huggingface PyTorch
+[2024-07-23 17:43:55] INFO auto_weight.py:144: [92mFound[0m source weight format: huggingface-safetensor. Source configuration: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model.safetensors.index.json
+[2024-07-23 17:43:55] INFO auto_weight.py:107: Using source weight configuration: [1m/Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model.safetensors.index.json[0m. Use `--source` to override.
+[2024-07-23 17:43:55] INFO auto_weight.py:111: Using source weight format: [1mhuggingface-safetensor[0m. Use `--source-format` to override.
+[2024-07-23 17:43:55] INFO auto_config.py:154: [92mFound[0m model type: [1mllama[0m. Use `--model-type` to override.
+[2024-07-23 17:43:55] INFO llama_model.py:62: [1mcontext_window_size[0m not found in config.json. Falling back to [1mmax_position_embeddings[0m (131072)
+[2024-07-23 17:43:55] INFO llama_model.py:82: [1mprefill_chunk_size[0m defaults to 2048
+[1mWeight conversion with arguments:[0m
+  [1m--config[0m          /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/config.json
+  [1m--quantization[0m    NoQuantize(name='q0f16', kind='no-quant', model_dtype='float16')
+  [1m--model-type[0m      llama
+  [1m--device[0m          metal:0
+  [1m--source[0m          /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model.safetensors.index.json
+  [1m--source-format[0m   huggingface-safetensor
+  [1m--output[0m          local_dir/Llama-3.1-70B-Instruct-q0f16-MLC
+Start storing to cache local_dir/Llama-3.1-70B-Instruct-q0f16-MLC
+  0%|          | 0/483 [00:00<?, ?it/s]                                       [2024-07-23 17:44:00] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00030-of-00030.safetensors
+  0%|          | 0/483 [00:00<?, ?it/s]                                       [2024-07-23 17:44:04] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mlm_head.weight[0m", shape: (128256, 8192), dtype: float16
+  0%|          | 0/483 [00:04<?, ?it/s]  0%|          | 1/483 [00:08<1:09:33,  8.66s/it]                                                 [2024-07-23 17:44:09] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00030-of-00030.safetensors
+  0%|          | 1/483 [00:08<1:09:33,  8.66s/it]                                                 [2024-07-23 17:44:09] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00001-of-00030.safetensors
+  0%|          | 1/483 [00:08<1:09:33,  8.66s/it]                                                 [2024-07-23 17:44:15] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.embed_tokens.weight[0m", shape: (128256, 8192), dtype: float16
+  0%|          | 1/483 [00:15<1:09:33,  8.66s/it]  0%|          | 2/483 [00:19<1:20:25, 10.03s/it]                                                 [2024-07-23 17:44:20] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.0.input_layernorm.weight[0m", shape: (8192,), dtype: float16
+  0%|          | 2/483 [00:19<1:20:25, 10.03s/it]  1%|          | 3/483 [00:19<44:16,  5.53s/it]                                                 [2024-07-23 17:44:21] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.0.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16
+  1%|          | 3/483 [00:20<44:16,  5.53s/it]  1%|          | 4/483 [00:21<31:43,  3.97s/it]                                               [2024-07-23 17:44:23] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.0.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16
+  1%|          | 4/483 [00:23<31:43,  3.97s/it]  1%|          | 5/483 [00:25<30:36,  3.84s/it]                                               [2024-07-23 17:44:25] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.0.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16
+  1%|          | 5/483 [00:25<30:36,  3.84s/it]  1%|          | 6/483 [00:25<20:27,  2.57s/it]                                               [2024-07-23 17:44:26] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.0.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16
+  1%|          | 6/483 [00:25<20:27,  2.57s/it]  1%|▏         | 7/483 [00:25<15:19,  1.93s/it]                                               [2024-07-23 17:44:26] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.0.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16
+  1%|▏         | 7/483 [00:25<15:19,  1.93s/it]  2%|▏         | 8/483 [00:26<11:29,  1.45s/it]                                               [2024-07-23 17:44:26] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00002-of-00030.safetensors
+  2%|▏         | 8/483 [00:26<11:29,  1.45s/it]                                               [2024-07-23 17:44:32] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.1.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16
+  2%|▏         | 8/483 [00:32<11:29,  1.45s/it]  2%|▏         | 9/483 [00:34<27:20,  3.46s/it]                                               [2024-07-23 17:44:35] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.1.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16
+  2%|▏         | 9/483 [00:34<27:20,  3.46s/it]  2%|▏         | 10/483 [00:34<20:33,  2.61s/it]                                                [2024-07-23 17:44:35] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.1.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16
+  2%|▏         | 10/483 [00:34<20:33,  2.61s/it]  2%|▏         | 11/483 [00:35<15:14,  1.94s/it]                                                [2024-07-23 17:44:35] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.1.input_layernorm.weight[0m", shape: (8192,), dtype: float16
+  2%|▏         | 11/483 [00:35<15:14,  1.94s/it]                                                [2024-07-23 17:44:36] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.1.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16
+  2%|▏         | 11/483 [00:35<15:14,  1.94s/it]  3%|▎         | 13/483 [00:36<10:53,  1.39s/it]                                                [2024-07-23 17:44:37] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.1.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16
+  3%|▎         | 13/483 [00:36<10:53,  1.39s/it]                                                [2024-07-23 17:44:37] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.2.input_layernorm.weight[0m", shape: (8192,), dtype: float16
+  3%|▎         | 13/483 [00:36<10:53,  1.39s/it]                                                [2024-07-23 17:44:38] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.2.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16
+  3%|▎         | 13/483 [00:37<10:53,  1.39s/it]  3%|▎         | 16/483 [00:38<07:23,  1.05it/s]                                                [2024-07-23 17:44:40] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.2.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16
+  3%|▎         | 16/483 [00:39<07:23,  1.05it/s]  4%|▎         | 17/483 [00:41<11:34,  1.49s/it]                                                [2024-07-23 17:44:42] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.2.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16
+  4%|▎         | 17/483 [00:42<11:34,  1.49s/it]  4%|▎         | 18/483 [00:42<09:15,  1.19s/it]                                                [2024-07-23 17:44:43] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.2.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16
+  4%|▎         | 18/483 [00:42<09:15,  1.19s/it]  4%|▍         | 19/483 [00:42<08:04,  1.04s/it]                                                [2024-07-23 17:44:43] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.2.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16
+  4%|▍         | 19/483 [00:42<08:04,  1.04s/it]  4%|▍         | 20/483 [00:43<06:48,  1.13it/s]                                                [2024-07-23 17:44:43] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.3.input_layernorm.weight[0m", shape: (8192,), dtype: float16
+  4%|▍         | 20/483 [00:43<06:48,  1.13it/s]                                                [2024-07-23 17:44:44] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.3.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16
+  4%|▍         | 20/483 [00:43<06:48,  1.13it/s]  5%|▍         | 22/483 [00:44<06:20,  1.21it/s]                                                [2024-07-23 17:44:46] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.3.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16
+  5%|▍         | 22/483 [00:46<06:20,  1.21it/s]  5%|▍         | 23/483 [00:48<11:03,  1.44s/it]                                                [2024-07-23 17:44:48] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.3.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16
+  5%|▍         | 23/483 [00:48<11:03,  1.44s/it]  5%|▍         | 24/483 [00:48<08:30,  1.11s/it]                                                [2024-07-23 17:44:49] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.3.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16
+  5%|▍         | 24/483 [00:48<08:30,  1.11s/it]  5%|▌         | 25/483 [00:48<07:23,  1.03it/s]                                                [2024-07-23 17:44:49] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.3.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16
+  5%|▌         | 25/483 [00:48<07:23,  1.03it/s]  5%|▌         | 26/483 [00:49<06:12,  1.23it/s]                                                [2024-07-23 17:44:50] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.4.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16
+  5%|▌         | 26/483 [00:49<06:12,  1.23it/s]  6%|▌         | 27/483 [00:49<05:38,  1.35it/s]                                                [2024-07-23 17:44:50] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.4.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16
+  6%|▌         | 27/483 [00:49<05:38,  1.35it/s]  6%|▌         | 28/483 [00:50<04:56,  1.54it/s]                                                [2024-07-23 17:44:50] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00001-of-00030.safetensors
+  6%|▌         | 28/483 [00:50<04:56,  1.54it/s]                                                [2024-07-23 17:44:50] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00002-of-00030.safetensors
+  6%|▌         | 28/483 [00:50<04:56,  1.54it/s]                                                [2024-07-23 17:44:51] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00005-of-00030.safetensors
+  6%|▌         | 28/483 [00:50<04:56,  1.54it/s]                                                [2024-07-23 17:44:53] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.10.input_layernorm.weight[0m", shape: (8192,), dtype: float16
+  6%|▌         | 28/483 [00:52<04:56,  1.54it/s]  6%|▌         | 29/483 [00:52<08:34,  1.13s/it]                                                [2024-07-23 17:44:53] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.10.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16
+  6%|▌         | 29/483 [00:53<08:34,  1.13s/it]  6%|▌         | 30/483 [00:53<09:23,  1.24s/it]                                                [2024-07-23 17:44:56] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.10.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16
+  6%|▌         | 30/483 [00:55<09:23,  1.24s/it]  6%|▋         | 31/483 [00:57<14:28,  1.92s/it]                                                [2024-07-23 17:44:58] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.10.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16
+  6%|▋         | 31/483 [00:57<14:28,  1.92s/it]  7%|▋         | 32/483 [00:57<10:24,  1.38s/it]                                                [2024-07-23 17:44:58] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.10.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16
+  7%|▋         | 32/483 [00:57<10:24,  1.38s/it]  7%|▋         | 33/483 [00:58<08:31,  1.14s/it]                                                [2024-07-23 17:44:58] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.10.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16
+  7%|▋         | 33/483 [00:58<08:31,  1.14s/it]  7%|▋         | 34/483 [00:58<06:51,  1.09it/s]                                                [2024-07-23 17:44:59] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.11.input_layernorm.weight[0m", shape: (8192,), dtype: float16
+  7%|▋         | 34/483 [00:58<06:51,  1.09it/s]                                                [2024-07-23 17:44:59] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.11.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16
+  7%|▋         | 34/483 [00:59<06:51,  1.09it/s]  7%|▋         | 36/483 [01:00<06:15,  1.19it/s]                                                [2024-07-23 17:45:02] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.11.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16
+  7%|▋         | 36/483 [01:01<06:15,  1.19it/s]  8%|▊         | 37/483 [01:03<10:53,  1.46s/it]                                                [2024-07-23 17:45:04] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.11.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16
+  8%|▊         | 37/483 [01:03<10:53,  1.46s/it]  8%|▊         | 38/483 [01:03<08:14,  1.11s/it]                                                [2024-07-23 17:45:04] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.11.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16
+  8%|▊         | 38/483 [01:03<08:14,  1.11s/it]  8%|▊         | 39/483 [01:04<07:03,  1.05it/s]                                                [2024-07-23 17:45:04] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.11.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16
+  8%|▊         | 39/483 [01:04<07:03,  1.05it/s]  8%|▊         | 40/483 [01:04<05:53,  1.25it/s]                                                [2024-07-23 17:45:06] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.12.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16
+  8%|▊         | 40/483 [01:05<05:53,  1.25it/s]  8%|▊         | 41/483 [01:07<11:09,  1.52s/it]                                                [2024-07-23 17:45:08] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.12.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16
+  8%|▊         | 41/483 [01:08<11:09,  1.52s/it]  9%|▊         | 42/483 [01:08<09:18,  1.27s/it]                                                [2024-07-23 17:45:09] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.12.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16
+  9%|▊         | 42/483 [01:08<09:18,  1.27s/it]  9%|▉         | 43/483 [01:08<07:26,  1.01s/it]                                                [2024-07-23 17:45:09] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00005-of-00030.safetensors
+  9%|▉         | 43/483 [01:08<07:26,  1.01s/it]                                                [2024-07-23 17:45:09] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00006-of-00030.safetensors
+  9%|▉         | 43/483 [01:08<07:26,  1.01s/it]                                                [2024-07-23 17:45:11] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.12.input_layernorm.weight[0m", shape: (8192,), dtype: float16
+  9%|▉         | 43/483 [01:11<07:26,  1.01s/it]  9%|▉         | 44/483 [01:11<10:20,  1.41s/it]                                                [2024-07-23 17:45:12] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.12.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16
+  9%|▉         | 44/483 [01:11<10:20,  1.41s/it]  9%|▉         | 45/483 [01:12<10:31,  1.44s/it]                                                [2024-07-23 17:45:13] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.12.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16
+  9%|▉         | 45/483 [01:12<10:31,  1.44s/it]                                                [2024-07-23 17:45:13] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.13.input_layernorm.weight[0m", shape: (8192,), dtype: float16
+  9%|▉         | 45/483 [01:12<10:31,  1.44s/it]                                                [2024-07-23 17:45:13] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.13.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16
+  9%|▉         | 45/483 [01:13<10:31,  1.44s/it] 10%|▉         | 48/483 [01:14<06:40,  1.09it/s]                                                [2024-07-23 17:45:16] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.13.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16
+ 10%|▉         | 48/483 [01:15<06:40,  1.09it/s] 10%|█         | 49/483 [01:17<10:36,  1.47s/it]                                                [2024-07-23 17:45:18] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.13.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 10%|█         | 49/483 [01:17<10:36,  1.47s/it] 10%|█         | 50/483 [01:17<08:19,  1.15s/it]                                                [2024-07-23 17:45:18] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.13.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16
+ 10%|█         | 50/483 [01:18<08:19,  1.15s/it] 11%|█         | 51/483 [01:18<07:12,  1.00s/it]                                                [2024-07-23 17:45:19] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.13.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16
+ 11%|█         | 51/483 [01:18<07:12,  1.00s/it] 11%|█         | 52/483 [01:18<06:03,  1.19it/s]                                                [2024-07-23 17:45:19] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.14.input_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 11%|█         | 52/483 [01:18<06:03,  1.19it/s]                                                [2024-07-23 17:45:20] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.14.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16
+ 11%|█         | 52/483 [01:19<06:03,  1.19it/s] 11%|█         | 54/483 [01:20<05:43,  1.25it/s]                                                [2024-07-23 17:45:22] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.14.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16
+ 11%|█         | 54/483 [01:21<05:43,  1.25it/s] 11%|█▏        | 55/483 [01:23<10:08,  1.42s/it]                                                [2024-07-23 17:45:24] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.14.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 11%|█▏        | 55/483 [01:23<10:08,  1.42s/it] 12%|█▏        | 56/483 [01:23<07:45,  1.09s/it]                                                [2024-07-23 17:45:24] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.14.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16
+ 12%|█▏        | 56/483 [01:24<07:45,  1.09s/it] 12%|█▏        | 57/483 [01:24<06:41,  1.06it/s]                                                [2024-07-23 17:45:25] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.14.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16
+ 12%|█▏        | 57/483 [01:24<06:41,  1.06it/s] 12%|█▏        | 58/483 [01:24<05:37,  1.26it/s]                                                [2024-07-23 17:45:25] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00007-of-00030.safetensors
+ 12%|█▏        | 58/483 [01:24<05:37,  1.26it/s]                                                [2024-07-23 17:45:28] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.15.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16
+ 12%|█▏        | 58/483 [01:28<05:37,  1.26it/s] 12%|█▏        | 59/483 [01:30<14:42,  2.08s/it]                                                [2024-07-23 17:45:31] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.15.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16
+ 12%|█▏        | 59/483 [01:30<14:42,  2.08s/it] 12%|█▏        | 60/483 [01:30<11:49,  1.68s/it]                                                [2024-07-23 17:45:31] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.15.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16
+ 12%|█▏        | 60/483 [01:30<11:49,  1.68s/it] 13%|█▎        | 61/483 [01:31<09:11,  1.31s/it]                                                [2024-07-23 17:45:31] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.15.input_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 13%|█▎        | 61/483 [01:31<09:11,  1.31s/it]                                                [2024-07-23 17:45:32] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.15.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16
+ 13%|█▎        | 61/483 [01:31<09:11,  1.31s/it] 13%|█▎        | 63/483 [01:32<07:24,  1.06s/it]                                                [2024-07-23 17:45:33] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.15.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 13%|█▎        | 63/483 [01:32<07:24,  1.06s/it]                                                [2024-07-23 17:45:33] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.16.input_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 13%|█▎        | 63/483 [01:32<07:24,  1.06s/it]                                                [2024-07-23 17:45:34] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.16.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16
+ 13%|█▎        | 63/483 [01:33<07:24,  1.06s/it] 14%|█▎        | 66/483 [01:34<05:30,  1.26it/s]                                                [2024-07-23 17:45:36] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.16.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16
+ 14%|█▎        | 66/483 [01:35<05:30,  1.26it/s] 14%|█▍        | 67/483 [01:37<09:03,  1.31s/it]                                                [2024-07-23 17:45:38] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.16.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 14%|█▍        | 67/483 [01:37<09:03,  1.31s/it] 14%|█▍        | 68/483 [01:37<07:16,  1.05s/it]                                                [2024-07-23 17:45:38] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.16.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16
+ 14%|█▍        | 68/483 [01:38<07:16,  1.05s/it] 14%|█▍        | 69/483 [01:38<06:25,  1.08it/s]                                                [2024-07-23 17:45:39] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.16.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16
+ 14%|█▍        | 69/483 [01:38<06:25,  1.08it/s] 14%|█▍        | 70/483 [01:38<05:29,  1.25it/s]                                                [2024-07-23 17:45:39] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.17.input_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 14%|█▍        | 70/483 [01:38<05:29,  1.25it/s]                                                [2024-07-23 17:45:40] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.17.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16
+ 14%|█▍        | 70/483 [01:39<05:29,  1.25it/s] 15%|█▍        | 72/483 [01:40<05:19,  1.29it/s]                                                [2024-07-23 17:45:42] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.17.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16
+ 15%|█▍        | 72/483 [01:41<05:19,  1.29it/s] 15%|█▌        | 73/483 [01:43<09:23,  1.37s/it]                                                [2024-07-23 17:45:44] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.17.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 15%|█▌        | 73/483 [01:43<09:23,  1.37s/it] 15%|█▌        | 74/483 [01:43<07:14,  1.06s/it]                                                [2024-07-23 17:45:44] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.17.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16
+ 15%|█▌        | 74/483 [01:44<07:14,  1.06s/it] 16%|█▌        | 75/483 [01:44<06:17,  1.08it/s]                                                [2024-07-23 17:45:45] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.17.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16
+ 16%|█▌        | 75/483 [01:44<06:17,  1.08it/s] 16%|█▌        | 76/483 [01:44<05:18,  1.28it/s]                                                [2024-07-23 17:45:45] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.18.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16
+ 16%|█▌        | 76/483 [01:44<05:18,  1.28it/s] 16%|█▌        | 77/483 [01:45<04:49,  1.40it/s]                                                [2024-07-23 17:45:46] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.18.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16
+ 16%|█▌        | 77/483 [01:45<04:49,  1.40it/s] 16%|█▌        | 78/483 [01:45<04:12,  1.60it/s]                                                [2024-07-23 17:45:46] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00006-of-00030.safetensors
+ 16%|█▌        | 78/483 [01:45<04:12,  1.60it/s]                                                [2024-07-23 17:45:46] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00007-of-00030.safetensors
+ 16%|█▌        | 78/483 [01:45<04:12,  1.60it/s]                                                [2024-07-23 17:45:46] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00008-of-00030.safetensors
+ 16%|█▌        | 78/483 [01:45<04:12,  1.60it/s]                                                [2024-07-23 17:45:48] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.18.input_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 16%|█▌        | 78/483 [01:47<04:12,  1.60it/s] 16%|█▋        | 79/483 [01:47<07:27,  1.11s/it]                                                [2024-07-23 17:45:49] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.18.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16
+ 16%|█▋        | 79/483 [01:48<07:27,  1.11s/it] 17%|█▋        | 80/483 [01:49<08:13,  1.22s/it]                                                [2024-07-23 17:45:51] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.18.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16
+ 17%|█▋        | 80/483 [01:51<08:13,  1.22s/it] 17%|█▋        | 81/483 [01:52<12:37,  1.88s/it]                                                [2024-07-23 17:45:53] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.18.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 17%|█▋        | 81/483 [01:52<12:37,  1.88s/it] 17%|█▋        | 82/483 [01:53<09:04,  1.36s/it]                                                [2024-07-23 17:45:53] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.19.input_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 17%|█▋        | 82/483 [01:53<09:04,  1.36s/it]                                                [2024-07-23 17:45:54] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.19.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16
+ 17%|█▋        | 82/483 [01:53<09:04,  1.36s/it] 17%|█▋        | 84/483 [01:54<07:09,  1.08s/it]                                                [2024-07-23 17:45:56] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.19.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16
+ 17%|█▋        | 84/483 [01:56<07:09,  1.08s/it] 18%|█▊        | 85/483 [01:57<11:01,  1.66s/it]                                                [2024-07-23 17:45:58] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.19.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 18%|█▊        | 85/483 [01:58<11:01,  1.66s/it] 18%|█▊        | 86/483 [01:58<08:19,  1.26s/it]                                                [2024-07-23 17:45:58] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.19.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16
+ 18%|█▊        | 86/483 [01:58<08:19,  1.26s/it] 18%|█▊        | 87/483 [01:58<07:01,  1.06s/it]                                                [2024-07-23 17:45:59] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.19.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16
+ 18%|█▊        | 87/483 [01:58<07:01,  1.06s/it] 18%|█▊        | 88/483 [01:59<05:49,  1.13it/s]                                                [2024-07-23 17:45:59] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.20.input_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 18%|█▊        | 88/483 [01:59<05:49,  1.13it/s]                                                [2024-07-23 17:46:00] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.20.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16
+ 18%|█▊        | 88/483 [01:59<05:49,  1.13it/s] 19%|█▊        | 90/483 [02:00<05:23,  1.22it/s]                                                [2024-07-23 17:46:02] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.20.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16
+ 19%|█▊        | 90/483 [02:02<05:23,  1.22it/s] 19%|█▉        | 91/483 [02:04<09:31,  1.46s/it]                                                [2024-07-23 17:46:04] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.20.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 19%|█▉        | 91/483 [02:04<09:31,  1.46s/it] 19%|█▉        | 92/483 [02:04<07:14,  1.11s/it]                                                [2024-07-23 17:46:05] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.20.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16
+ 19%|█▉        | 92/483 [02:04<07:14,  1.11s/it] 19%|█▉        | 93/483 [02:04<06:13,  1.04it/s]                                                [2024-07-23 17:46:05] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.20.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16
+ 19%|█▉        | 93/483 [02:04<06:13,  1.04it/s] 19%|█▉        | 94/483 [02:05<05:16,  1.23it/s]                                                [2024-07-23 17:46:05] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.21.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16
+ 19%|█▉        | 94/483 [02:05<05:16,  1.23it/s] 20%|█▉        | 95/483 [02:05<04:45,  1.36it/s]                                                [2024-07-23 17:46:06] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00008-of-00030.safetensors
+ 20%|█▉        | 95/483 [02:05<04:45,  1.36it/s]                                                [2024-07-23 17:46:06] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00009-of-00030.safetensors
+ 20%|█▉        | 95/483 [02:05<04:45,  1.36it/s]                                                [2024-07-23 17:46:08] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.21.input_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 20%|█▉        | 95/483 [02:08<04:45,  1.36it/s] 20%|█▉        | 96/483 [02:08<08:04,  1.25s/it]                                                [2024-07-23 17:46:09] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.21.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16
+ 20%|█▉        | 96/483 [02:08<08:04,  1.25s/it] 20%|██        | 97/483 [02:09<08:38,  1.34s/it]                                                [2024-07-23 17:46:12] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.21.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16
+ 20%|██        | 97/483 [02:11<08:38,  1.34s/it] 20%|██        | 98/483 [02:13<13:13,  2.06s/it]                                                [2024-07-23 17:46:14] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.21.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 20%|██        | 98/483 [02:13<13:13,  2.06s/it] 20%|██        | 99/483 [02:13<09:29,  1.48s/it]                                                [2024-07-23 17:46:14] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.21.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16
+ 20%|██        | 99/483 [02:13<09:29,  1.48s/it] 21%|██        | 100/483 [02:14<07:24,  1.16s/it]                                                 [2024-07-23 17:46:14] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.22.input_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 21%|██        | 100/483 [02:14<07:24,  1.16s/it]                                                 [2024-07-23 17:46:15] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.22.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16
+ 21%|██        | 100/483 [02:14<07:24,  1.16s/it] 21%|██        | 102/483 [02:15<06:09,  1.03it/s]                                                 [2024-07-23 17:46:17] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.22.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16
+ 21%|██        | 102/483 [02:17<06:09,  1.03it/s] 21%|██▏       | 103/483 [02:19<10:12,  1.61s/it]                                                 [2024-07-23 17:46:19] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.22.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 21%|██▏       | 103/483 [02:19<10:12,  1.61s/it] 22%|██▏       | 104/483 [02:19<07:42,  1.22s/it]                                                 [2024-07-23 17:46:20] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.22.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16
+ 22%|██▏       | 104/483 [02:19<07:42,  1.22s/it] 22%|██▏       | 105/483 [02:19<06:31,  1.03s/it]                                                 [2024-07-23 17:46:20] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.22.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16
+ 22%|██▏       | 105/483 [02:19<06:31,  1.03s/it] 22%|██▏       | 106/483 [02:20<05:22,  1.17it/s]                                                 [2024-07-23 17:46:20] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.23.input_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 22%|██▏       | 106/483 [02:20<05:22,  1.17it/s]                                                 [2024-07-23 17:46:21] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.23.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16
+ 22%|██▏       | 106/483 [02:20<05:22,  1.17it/s] 22%|██▏       | 108/483 [02:21<05:02,  1.24it/s]                                                 [2024-07-23 17:46:23] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.23.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16
+ 22%|██▏       | 108/483 [02:23<05:02,  1.24it/s] 23%|██▎       | 109/483 [02:25<09:08,  1.47s/it]                                                 [2024-07-23 17:46:25] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.23.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 23%|██▎       | 109/483 [02:25<09:08,  1.47s/it] 23%|██▎       | 110/483 [02:25<06:57,  1.12s/it]                                                 [2024-07-23 17:46:26] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.23.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16
+ 23%|██▎       | 110/483 [02:25<06:57,  1.12s/it] 23%|██▎       | 111/483 [02:25<05:57,  1.04it/s]                                                 [2024-07-23 17:46:26] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.23.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16
+ 23%|██▎       | 111/483 [02:25<05:57,  1.04it/s] 23%|██▎       | 112/483 [02:26<04:58,  1.24it/s]                                                 [2024-07-23 17:46:26] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00009-of-00030.safetensors
+ 23%|██▎       | 112/483 [02:26<04:58,  1.24it/s]                                                 [2024-07-23 17:46:27] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00010-of-00030.safetensors
+ 23%|██▎       | 112/483 [02:26<04:58,  1.24it/s]                                                 [2024-07-23 17:46:29] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.24.input_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 23%|██▎       | 112/483 [02:28<04:58,  1.24it/s] 23%|██▎       | 113/483 [02:28<07:42,  1.25s/it]                                                 [2024-07-23 17:46:29] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.24.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16
+ 23%|██▎       | 113/483 [02:29<07:42,  1.25s/it] 24%|██▎       | 114/483 [02:30<08:11,  1.33s/it]                                                 [2024-07-23 17:46:32] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.24.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16
+ 24%|██▎       | 114/483 [02:31<08:11,  1.33s/it] 24%|██▍       | 115/483 [02:33<12:30,  2.04s/it]                                                 [2024-07-23 17:46:34] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.24.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 24%|██▍       | 115/483 [02:33<12:30,  2.04s/it] 24%|██▍       | 116/483 [02:33<09:00,  1.47s/it]                                                 [2024-07-23 17:46:34] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.24.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16
+ 24%|██▍       | 116/483 [02:34<09:00,  1.47s/it] 24%|██▍       | 117/483 [02:34<07:17,  1.19s/it]                                                 [2024-07-23 17:46:35] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.24.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16
+ 24%|██▍       | 117/483 [02:34<07:17,  1.19s/it] 24%|██▍       | 118/483 [02:34<05:49,  1.04it/s]                                                 [2024-07-23 17:46:35] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.25.input_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 24%|██▍       | 118/483 [02:34<05:49,  1.04it/s]                                                 [2024-07-23 17:46:36] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.25.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16
+ 24%|██▍       | 118/483 [02:35<05:49,  1.04it/s] 25%|██▍       | 120/483 [02:36<05:13,  1.16it/s]                                                 [2024-07-23 17:46:38] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.25.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16
+ 25%|██▍       | 120/483 [02:37<05:13,  1.16it/s] 25%|██▌       | 121/483 [02:39<09:11,  1.52s/it]                                                 [2024-07-23 17:46:40] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.25.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 25%|██▌       | 121/483 [02:39<09:11,  1.52s/it] 25%|██▌       | 122/483 [02:40<06:56,  1.15s/it]                                                 [2024-07-23 17:46:40] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.25.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16
+ 25%|██▌       | 122/483 [02:40<06:56,  1.15s/it] 25%|██▌       | 123/483 [02:40<05:54,  1.02it/s]                                                 [2024-07-23 17:46:41] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.25.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16
+ 25%|██▌       | 123/483 [02:40<05:54,  1.02it/s] 26%|██▌       | 124/483 [02:40<04:54,  1.22it/s]                                                 [2024-07-23 17:46:43] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.26.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16
+ 26%|██▌       | 124/483 [02:42<04:54,  1.22it/s] 26%|██▌       | 125/483 [02:44<09:30,  1.59s/it]                                                 [2024-07-23 17:46:45] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.26.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16
+ 26%|██▌       | 125/483 [02:44<09:30,  1.59s/it] 26%|██▌       | 126/483 [02:45<07:51,  1.32s/it]                                                 [2024-07-23 17:46:45] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.26.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16
+ 26%|██▌       | 126/483 [02:45<07:51,  1.32s/it] 26%|██▋       | 127/483 [02:45<06:14,  1.05s/it]                                                 [2024-07-23 17:46:46] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00010-of-00030.safetensors
+ 26%|██▋       | 127/483 [02:45<06:14,  1.05s/it]                                                 [2024-07-23 17:46:46] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00011-of-00030.safetensors
+ 26%|██▋       | 127/483 [02:45<06:14,  1.05s/it]                                                 [2024-07-23 17:46:48] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.26.input_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 26%|██▋       | 127/483 [02:48<06:14,  1.05s/it] 27%|���█▋       | 128/483 [02:48<08:47,  1.48s/it]                                                 [2024-07-23 17:46:49] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.26.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16
+ 27%|██▋       | 128/483 [02:48<08:47,  1.48s/it] 27%|██▋       | 129/483 [02:49<08:51,  1.50s/it]                                                 [2024-07-23 17:46:50] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.26.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 27%|██▋       | 129/483 [02:49<08:51,  1.50s/it]                                                 [2024-07-23 17:46:50] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.27.input_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 27%|██▋       | 129/483 [02:49<08:51,  1.50s/it]                                                 [2024-07-23 17:46:50] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.27.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16
+ 27%|██▋       | 129/483 [02:50<08:51,  1.50s/it] 27%|██▋       | 132/483 [02:51<05:30,  1.06it/s]                                                 [2024-07-23 17:46:53] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.27.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16
+ 27%|██▋       | 132/483 [02:52<05:30,  1.06it/s] 28%|██▊       | 133/483 [02:54<08:53,  1.53s/it]                                                 [2024-07-23 17:46:55] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.27.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 28%|██▊       | 133/483 [02:54<08:53,  1.53s/it] 28%|██▊       | 134/483 [02:54<06:57,  1.20s/it]                                                 [2024-07-23 17:46:55] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.27.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16
+ 28%|██▊       | 134/483 [02:55<06:57,  1.20s/it] 28%|██▊       | 135/483 [02:55<05:59,  1.03s/it]                                                 [2024-07-23 17:46:56] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.27.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16
+ 28%|██▊       | 135/483 [02:55<05:59,  1.03s/it] 28%|██▊       | 136/483 [02:55<05:00,  1.15it/s]                                                 [2024-07-23 17:46:56] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.28.input_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 28%|██▊       | 136/483 [02:55<05:00,  1.15it/s]                                                 [2024-07-23 17:46:57] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.28.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16
+ 28%|██▊       | 136/483 [02:56<05:00,  1.15it/s] 29%|██▊       | 138/483 [02:57<04:41,  1.23it/s]                                                 [2024-07-23 17:46:59] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.28.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16
+ 29%|██▊       | 138/483 [02:58<04:41,  1.23it/s] 29%|██▉       | 139/483 [03:00<08:12,  1.43s/it]                                                 [2024-07-23 17:47:01] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.28.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 29%|██▉       | 139/483 [03:00<08:12,  1.43s/it] 29%|██▉       | 140/483 [03:00<06:17,  1.10s/it]                                                 [2024-07-23 17:47:01] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.28.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16
+ 29%|██▉       | 140/483 [03:01<06:17,  1.10s/it] 29%|██▉       | 141/483 [03:01<05:25,  1.05it/s]                                                 [2024-07-23 17:47:02] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.28.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16
+ 29%|██▉       | 141/483 [03:01<05:25,  1.05it/s] 29%|██▉       | 142/483 [03:01<04:32,  1.25it/s]                                                 [2024-07-23 17:47:02] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00012-of-00030.safetensors
+ 29%|██▉       | 142/483 [03:01<04:32,  1.25it/s]                                                 [2024-07-23 17:47:07] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.29.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16
+ 29%|██▉       | 142/483 [03:07<04:32,  1.25it/s] 30%|██▉       | 143/483 [03:08<14:34,  2.57s/it]                                                 [2024-07-23 17:47:09] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.29.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16
+ 30%|██▉       | 143/483 [03:09<14:34,  2.57s/it] 30%|██▉       | 144/483 [03:09<11:23,  2.02s/it]                                                 [2024-07-23 17:47:10] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.29.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16
+ 30%|██▉       | 144/483 [03:09<11:23,  2.02s/it] 30%|███       | 145/483 [03:09<08:42,  1.55s/it]                                                 [2024-07-23 17:47:10] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.29.input_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 30%|███       | 145/483 [03:09<08:42,  1.55s/it]                                                 [2024-07-23 17:47:11] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.29.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16
+ 30%|███       | 145/483 [03:10<08:42,  1.55s/it] 30%|███       | 147/483 [03:11<06:33,  1.17s/it]                                                 [2024-07-23 17:47:12] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.29.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 30%|███       | 147/483 [03:11<06:33,  1.17s/it]                                                 [2024-07-23 17:47:12] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.30.input_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 30%|███       | 147/483 [03:11<06:33,  1.17s/it]                                                 [2024-07-23 17:47:12] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.30.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16
+ 30%|███       | 147/483 [03:11<06:33,  1.17s/it] 31%|███       | 150/483 [03:12<04:37,  1.20it/s]                                                 [2024-07-23 17:47:14] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.30.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16
+ 31%|███       | 150/483 [03:14<04:37,  1.20it/s] 31%|███▏      | 151/483 [03:16<07:05,  1.28s/it]                                                 [2024-07-23 17:47:16] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.30.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 31%|███▏      | 151/483 [03:16<07:05,  1.28s/it] 31%|███▏      | 152/483 [03:16<05:41,  1.03s/it]                                                 [2024-07-23 17:47:17] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.30.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16
+ 31%|███▏      | 152/483 [03:16<05:41,  1.03s/it] 32%|███▏      | 153/483 [03:16<05:01,  1.09it/s]                                                 [2024-07-23 17:47:17] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.30.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16
+ 32%|███▏      | 153/483 [03:16<05:01,  1.09it/s] 32%|███▏      | 154/483 [03:17<04:18,  1.27it/s]                                                 [2024-07-23 17:47:17] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.31.input_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 32%|███▏      | 154/483 [03:17<04:18,  1.27it/s]                                                 [2024-07-23 17:47:18] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.31.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16
+ 32%|███▏      | 154/483 [03:17<04:18,  1.27it/s] 32%|███▏      | 156/483 [03:18<04:08,  1.32it/s]                                                 [2024-07-23 17:47:20] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.31.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16
+ 32%|███▏      | 156/483 [03:19<04:08,  1.32it/s] 33%|███▎      | 157/483 [03:21<07:08,  1.31s/it]                                                 [2024-07-23 17:47:22] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.31.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 33%|███▎      | 157/483 [03:21<07:08,  1.31s/it] 33%|███▎      | 158/483 [03:21<05:29,  1.02s/it]                                                 [2024-07-23 17:47:22] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.31.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16
+ 33%|███▎      | 158/483 [03:22<05:29,  1.02s/it] 33%|███▎      | 159/483 [03:22<04:48,  1.12it/s]                                                 [2024-07-23 17:47:23] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.31.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16
+ 33%|███▎      | 159/483 [03:22<04:48,  1.12it/s] 33%|███▎      | 160/483 [03:22<04:04,  1.32it/s]                                                 [2024-07-23 17:47:23] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.32.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16
+ 33%|███▎      | 160/483 [03:22<04:04,  1.32it/s] 33%|███▎      | 161/483 [03:23<03:42,  1.45it/s]                                                 [2024-07-23 17:47:24] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.32.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16
+ 33%|███▎      | 161/483 [03:23<03:42,  1.45it/s] 34%|███▎      | 162/483 [03:23<03:15,  1.65it/s]                                                 [2024-07-23 17:47:24] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00012-of-00030.safetensors
+ 34%|███▎      | 162/483 [03:23<03:15,  1.65it/s]                                                 [2024-07-23 17:47:24] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00011-of-00030.safetensors
+ 34%|███▎      | 162/483 [03:23<03:15,  1.65it/s]                                                 [2024-07-23 17:47:24] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00013-of-00030.safetensors
+ 34%|███▎      | 162/483 [03:23<03:15,  1.65it/s]                                                 [2024-07-23 17:47:26] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.32.input_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 34%|███▎      | 162/483 [03:26<03:15,  1.65it/s] 34%|███▎      | 163/483 [03:26<06:03,  1.14s/it]                                                 [2024-07-23 17:47:27] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.32.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16
+ 34%|███▎      | 163/483 [03:26<06:03,  1.14s/it] 34%|███▍      | 164/483 [03:27<06:30,  1.23s/it]                                                 [2024-07-23 17:47:29] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.32.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16
+ 34%|███▍      | 164/483 [03:28<06:30,  1.23s/it] 34%|███▍      | 165/483 [03:30<09:36,  1.81s/it]                                                 [2024-07-23 17:47:31] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.32.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 34%|███▍      | 165/483 [03:30<09:36,  1.81s/it] 34%|███▍      | 166/483 [03:30<06:54,  1.31s/it]                                                 [2024-07-23 17:47:31] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.33.input_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 34%|███▍      | 166/483 [03:30<06:54,  1.31s/it]                                                 [2024-07-23 17:47:32] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.33.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16
+ 34%|███▍      | 166/483 [03:31<06:54,  1.31s/it] 35%|███▍      | 168/483 [03:32<05:26,  1.04s/it]                                                 [2024-07-23 17:47:34] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.33.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16
+ 35%|███▍      | 168/483 [03:33<05:26,  1.04s/it] 35%|███▍      | 169/483 [03:35<08:14,  1.57s/it]                                                 [2024-07-23 17:47:36] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.33.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 35%|███▍      | 169/483 [03:35<08:14,  1.57s/it] 35%|███▌      | 170/483 [03:35<06:12,  1.19s/it]                                                 [2024-07-23 17:47:36] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.33.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16
+ 35%|███▌      | 170/483 [03:35<06:12,  1.19s/it] 35%|███▌      | 171/483 [03:36<05:15,  1.01s/it]                                                 [2024-07-23 17:47:37] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.33.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16
+ 35%|███▌      | 171/483 [03:36<05:15,  1.01s/it] 36%|███▌      | 172/483 [03:36<04:23,  1.18it/s]                                                 [2024-07-23 17:47:37] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.34.input_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 36%|███▌      | 172/483 [03:36<04:23,  1.18it/s]                                                 [2024-07-23 17:47:37] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.34.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16
+ 36%|███▌      | 172/483 [03:37<04:23,  1.18it/s] 36%|███▌      | 174/483 [03:38<04:04,  1.27it/s]                                                 [2024-07-23 17:47:40] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.34.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16
+ 36%|███▌      | 174/483 [03:39<04:04,  1.27it/s] 36%|███▌      | 175/483 [03:41<07:03,  1.38s/it]                                                 [2024-07-23 17:47:41] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.34.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 36%|███▌      | 175/483 [03:41<07:03,  1.38s/it] 36%|███▋      | 176/483 [03:41<05:22,  1.05s/it]                                                 [2024-07-23 17:47:42] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.34.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16
+ 36%|███▋      | 176/483 [03:41<05:22,  1.05s/it] 37%|███▋      | 177/483 [03:41<04:39,  1.10it/s]                                                 [2024-07-23 17:47:42] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.34.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16
+ 37%|███▋      | 177/483 [03:42<04:39,  1.10it/s] 37%|███▋      | 178/483 [03:42<03:57,  1.29it/s]                                                 [2024-07-23 17:47:43] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.35.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16
+ 37%|███▋      | 178/483 [03:42<03:57,  1.29it/s] 37%|███▋      | 179/483 [03:42<03:36,  1.41it/s]                                                 [2024-07-23 17:47:43] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00013-of-00030.safetensors
+ 37%|███▋      | 179/483 [03:42<03:36,  1.41it/s]                                                 [2024-07-23 17:47:43] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00014-of-00030.safetensors
+ 37%|███▋      | 179/483 [03:43<03:36,  1.41it/s]                                                 [2024-07-23 17:47:46] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.35.input_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 37%|███▋      | 179/483 [03:45<03:36,  1.41it/s] 37%|███▋      | 180/483 [03:45<06:41,  1.32s/it]                                                 [2024-07-23 17:47:46] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.35.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16
+ 37%|███▋      | 180/483 [03:46<06:41,  1.32s/it] 37%|███▋      | 181/483 [03:47<06:51,  1.36s/it]                                                 [2024-07-23 17:47:49] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.35.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16
+ 37%|███▋      | 181/483 [03:48<06:51,  1.36s/it] 38%|███▊      | 182/483 [03:50<09:37,  1.92s/it]                                                 [2024-07-23 17:47:51] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.35.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 38%|███▊      | 182/483 [03:50<09:37,  1.92s/it] 38%|███▊      | 183/483 [03:50<06:54,  1.38s/it]                                                 [2024-07-23 17:47:51] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.35.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16
+ 38%|███▊      | 183/483 [03:50<06:54,  1.38s/it] 38%|███▊      | 184/483 [03:50<05:25,  1.09s/it]                                                 [2024-07-23 17:47:51] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.36.input_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 38%|███▊      | 184/483 [03:50<05:25,  1.09s/it]                                                 [2024-07-23 17:47:52] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.36.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16
+ 38%|███▊      | 184/483 [03:51<05:25,  1.09s/it] 39%|███▊      | 186/483 [03:52<04:33,  1.09it/s]                                                 [2024-07-23 17:47:54] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.36.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16
+ 39%|███▊      | 186/483 [03:53<04:33,  1.09it/s] 39%|███▊      | 187/483 [03:55<07:18,  1.48s/it]                                                 [2024-07-23 17:47:56] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.36.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 39%|███▊      | 187/483 [03:55<07:18,  1.48s/it] 39%|███▉      | 188/483 [03:55<05:31,  1.12s/it]                                                 [2024-07-23 17:47:56] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.36.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16
+ 39%|███▉      | 188/483 [03:55<05:31,  1.12s/it] 39%|███▉      | 189/483 [03:56<04:43,  1.04it/s]                                                 [2024-07-23 17:47:57] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.36.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16
+ 39%|███▉      | 189/483 [03:56<04:43,  1.04it/s] 39%|███▉      | 190/483 [03:56<03:55,  1.24it/s]                                                 [2024-07-23 17:47:57] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.37.input_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 39%|███▉      | 190/483 [03:56<03:55,  1.24it/s]                                                 [2024-07-23 17:47:57] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.37.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16
+ 39%|███▉      | 190/483 [03:57<03:55,  1.24it/s] 40%|███▉      | 192/483 [03:58<03:43,  1.30it/s]                                                 [2024-07-23 17:48:00] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.37.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16
+ 40%|███▉      | 192/483 [03:59<03:43,  1.30it/s] 40%|███▉      | 193/483 [04:01<06:31,  1.35s/it]                                                 [2024-07-23 17:48:01] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.37.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 40%|███▉      | 193/483 [04:01<06:31,  1.35s/it] 40%|████      | 194/483 [04:01<04:58,  1.03s/it]                                                 [2024-07-23 17:48:02] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.37.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16
+ 40%|████      | 194/483 [04:01<04:58,  1.03s/it] 40%|████      | 195/483 [04:01<04:18,  1.11it/s]                                                 [2024-07-23 17:48:02] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.37.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16
+ 40%|████      | 195/483 [04:02<04:18,  1.11it/s] 41%|████      | 196/483 [04:02<03:37,  1.32it/s]                                                 [2024-07-23 17:48:02] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00014-of-00030.safetensors
+ 41%|████      | 196/483 [04:02<03:37,  1.32it/s]                                                 [2024-07-23 17:48:03] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00015-of-00030.safetensors
+ 41%|████      | 196/483 [04:02<03:37,  1.32it/s]                                                 [2024-07-23 17:48:05] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.38.input_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 41%|████      | 196/483 [04:04<03:37,  1.32it/s] 41%|████      | 197/483 [04:04<05:38,  1.18s/it]                                                 [2024-07-23 17:48:05] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.38.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16
+ 41%|████      | 197/483 [04:05<05:38,  1.18s/it] 41%|████      | 198/483 [04:05<05:58,  1.26s/it]                                                 [2024-07-23 17:48:07] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.38.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16
+ 41%|████      | 198/483 [04:07<05:58,  1.26s/it] 41%|████      | 199/483 [04:09<08:39,  1.83s/it]                                                 [2024-07-23 17:48:09] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.38.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 41%|████      | 199/483 [04:09<08:39,  1.83s/it] 41%|████▏     | 200/483 [04:09<06:13,  1.32s/it]                                                 [2024-07-23 17:48:10] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.38.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16
+ 41%|████▏     | 200/483 [04:09<06:13,  1.32s/it] 42%|████▏     | 201/483 [04:09<05:06,  1.09s/it]                                                 [2024-07-23 17:48:10] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.38.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16
+ 42%|████▏     | 201/483 [04:09<05:06,  1.09s/it] 42%|████▏     | 202/483 [04:10<04:08,  1.13it/s]                                                 [2024-07-23 17:48:10] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.39.input_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 42%|████▏     | 202/483 [04:10<04:08,  1.13it/s]                                                 [2024-07-23 17:48:11] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.39.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16
+ 42%|████▏     | 202/483 [04:10<04:08,  1.13it/s] 42%|████▏     | 204/483 [04:11<03:45,  1.24it/s]                                                 [2024-07-23 17:48:13] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.39.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16
+ 42%|████▏     | 204/483 [04:12<03:45,  1.24it/s] 42%|████▏     | 205/483 [04:14<06:28,  1.40s/it]                                                 [2024-07-23 17:48:15] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.39.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 42%|████▏     | 205/483 [04:14<06:28,  1.40s/it] 43%|████▎     | 206/483 [04:14<04:53,  1.06s/it]                                                 [2024-07-23 17:48:15] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.39.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16
+ 43%|████▎     | 206/483 [04:15<04:53,  1.06s/it] 43%|████▎     | 207/483 [04:15<04:12,  1.09it/s]                                                 [2024-07-23 17:48:16] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.39.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16
+ 43%|████▎     | 207/483 [04:15<04:12,  1.09it/s] 43%|████▎     | 208/483 [04:15<03:31,  1.30it/s]                                                 [2024-07-23 17:48:17] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.40.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16
+ 43%|████▎     | 208/483 [04:17<03:31,  1.30it/s] 43%|████▎     | 209/483 [04:19<06:40,  1.46s/it]                                                 [2024-07-23 17:48:19] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.40.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16
+ 43%|████▎     | 209/483 [04:19<06:40,  1.46s/it] 43%|████▎     | 210/483 [04:19<05:33,  1.22s/it]                                                 [2024-07-23 17:48:20] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.40.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16
+ 43%|████▎     | 210/483 [04:19<05:33,  1.22s/it] 44%|████▎     | 211/483 [04:20<04:26,  1.02it/s]                                                 [2024-07-23 17:48:20] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00015-of-00030.safetensors
+ 44%|████▎     | 211/483 [04:20<04:26,  1.02it/s]                                                 [2024-07-23 17:48:20] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00003-of-00030.safetensors
+ 44%|████▎     | 211/483 [04:20<04:26,  1.02it/s]                                                 [2024-07-23 17:48:23] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.4.input_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 44%|████▎     | 211/483 [04:23<04:26,  1.02it/s] 44%|████▍     | 212/483 [04:23<07:11,  1.59s/it]                                                 [2024-07-23 17:48:24] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.4.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16
+ 44%|████▍     | 212/483 [04:23<07:11,  1.59s/it] 44%|████▍     | 213/483 [04:24<06:59,  1.55s/it]                                                 [2024-07-23 17:48:26] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.4.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16
+ 44%|████▍     | 213/483 [04:25<06:59,  1.55s/it] 44%|████▍     | 214/483 [04:27<09:09,  2.04s/it]                                                 [2024-07-23 17:48:28] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.4.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 44%|████▍     | 214/483 [04:27<09:09,  2.04s/it] 45%|████▍     | 215/483 [04:27<06:32,  1.47s/it]                                                 [2024-07-23 17:48:28] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.5.input_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 45%|████▍     | 215/483 [04:27<06:32,  1.47s/it]                                                 [2024-07-23 17:48:29] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.5.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16
+ 45%|████▍     | 215/483 [04:28<06:32,  1.47s/it] 45%|████▍     | 217/483 [04:29<04:58,  1.12s/it]                                                 [2024-07-23 17:48:31] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.5.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16
+ 45%|████▍     | 217/483 [04:30<04:58,  1.12s/it] 45%|████▌     | 218/483 [04:32<07:12,  1.63s/it]                                                 [2024-07-23 17:48:33] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.5.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 45%|████▌     | 218/483 [04:32<07:12,  1.63s/it] 45%|████▌     | 219/483 [04:32<05:25,  1.23s/it]                                                 [2024-07-23 17:48:33] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.5.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16
+ 45%|████▌     | 219/483 [04:32<05:25,  1.23s/it] 46%|████▌     | 220/483 [04:33<04:33,  1.04s/it]                                                 [2024-07-23 17:48:33] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.5.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16
+ 46%|████▌     | 220/483 [04:33<04:33,  1.04s/it] 46%|████▌     | 221/483 [04:33<03:45,  1.16it/s]                                                 [2024-07-23 17:48:34] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.6.input_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 46%|████▌     | 221/483 [04:33<03:45,  1.16it/s]                                                 [2024-07-23 17:48:34] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.6.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16
+ 46%|████▌     | 221/483 [04:34<03:45,  1.16it/s] 46%|████▌     | 223/483 [04:34<03:27,  1.25it/s]                                                 [2024-07-23 17:48:37] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.6.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16
+ 46%|████▌     | 223/483 [04:36<03:27,  1.25it/s] 46%|████▋     | 224/483 [04:38<05:55,  1.37s/it]                                                 [2024-07-23 17:48:38] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.6.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 46%|████▋     | 224/483 [04:38<05:55,  1.37s/it] 47%|████▋     | 225/483 [04:38<04:30,  1.05s/it]                                                 [2024-07-23 17:48:39] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.6.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16
+ 47%|████▋     | 225/483 [04:38<04:30,  1.05s/it] 47%|████▋     | 226/483 [04:38<03:53,  1.10it/s]                                                 [2024-07-23 17:48:39] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.6.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16
+ 47%|████▋     | 226/483 [04:38<03:53,  1.10it/s] 47%|████▋     | 227/483 [04:39<03:16,  1.30it/s]                                                 [2024-07-23 17:48:40] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.7.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16
+ 47%|████▋     | 227/483 [04:39<03:16,  1.30it/s] 47%|████▋     | 228/483 [04:39<02:58,  1.43it/s]                                                 [2024-07-23 17:48:40] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00003-of-00030.safetensors
+ 47%|████▋     | 228/483 [04:39<02:58,  1.43it/s]                                                 [2024-07-23 17:48:40] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00016-of-00030.safetensors
+ 47%|████▋     | 228/483 [04:39<02:58,  1.43it/s]                                                 [2024-07-23 17:48:42] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.40.input_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 47%|████▋     | 228/483 [04:42<02:58,  1.43it/s] 47%|████▋     | 229/483 [04:42<05:11,  1.23s/it]                                                 [2024-07-23 17:48:43] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.40.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16
+ 47%|████▋     | 229/483 [04:42<05:11,  1.23s/it] 48%|████▊     | 230/483 [04:43<05:28,  1.30s/it]                                                 [2024-07-23 17:48:44] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.40.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 48%|████▊     | 230/483 [04:43<05:28,  1.30s/it]                                                 [2024-07-23 17:48:44] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.41.input_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 48%|████▊     | 230/483 [04:43<05:28,  1.30s/it]                                                 [2024-07-23 17:48:45] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.41.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16
+ 48%|████▊     | 230/483 [04:44<05:28,  1.30s/it] 48%|████▊     | 233/483 [04:45<03:32,  1.18it/s]                                                 [2024-07-23 17:48:47] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.41.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16
+ 48%|████▊     | 233/483 [04:46<03:32,  1.18it/s] 48%|████▊     | 234/483 [04:48<05:33,  1.34s/it]                                                 [2024-07-23 17:48:49] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.41.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 48%|████▊     | 234/483 [04:48<05:33,  1.34s/it] 49%|████▊     | 235/483 [04:48<04:22,  1.06s/it]                                                 [2024-07-23 17:48:49] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.41.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16
+ 49%|████▊     | 235/483 [04:48<04:22,  1.06s/it] 49%|████▉     | 236/483 [04:49<03:49,  1.07it/s]                                                 [2024-07-23 17:48:49] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.41.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16
+ 49%|████▉     | 236/483 [04:49<03:49,  1.07it/s] 49%|████▉     | 237/483 [04:49<03:14,  1.27it/s]                                                 [2024-07-23 17:48:50] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.42.input_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 49%|████▉     | 237/483 [04:49<03:14,  1.27it/s]                                                 [2024-07-23 17:48:50] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.42.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16
+ 49%|████▉     | 237/483 [04:50<03:14,  1.27it/s] 49%|████▉     | 239/483 [04:50<03:05,  1.31it/s]                                                 [2024-07-23 17:48:52] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.42.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16
+ 49%|████▉     | 239/483 [04:52<03:05,  1.31it/s] 50%|████▉     | 240/483 [04:54<05:24,  1.33s/it]                                                 [2024-07-23 17:48:54] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.42.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 50%|████▉     | 240/483 [04:54<05:24,  1.33s/it] 50%|████▉     | 241/483 [04:54<04:08,  1.03s/it]                                                 [2024-07-23 17:48:55] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.42.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16
+ 50%|████▉     | 241/483 [04:54<04:08,  1.03s/it] 50%|█████     | 242/483 [04:54<03:35,  1.12it/s]                                                 [2024-07-23 17:48:55] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.42.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16
+ 50%|█████     | 242/483 [04:54<03:35,  1.12it/s] 50%|█████     | 243/483 [04:55<03:02,  1.32it/s]                                                 [2024-07-23 17:48:55] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00017-of-00030.safetensors
+ 50%|█████     | 243/483 [04:55<03:02,  1.32it/s]                                                 [2024-07-23 17:49:00] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.43.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16
+ 50%|█████     | 243/483 [05:00<03:02,  1.32it/s] 51%|█████     | 244/483 [05:02<09:59,  2.51s/it]                                                 [2024-07-23 17:49:03] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.43.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16
+ 51%|█████     | 244/483 [05:02<09:59,  2.51s/it] 51%|█████     | 245/483 [05:02<07:49,  1.97s/it]                                                 [2024-07-23 17:49:03] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.43.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16
+ 51%|█████     | 245/483 [05:02<07:49,  1.97s/it] 51%|█████     | 246/483 [05:03<05:59,  1.52s/it]                                                 [2024-07-23 17:49:03] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.43.input_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 51%|█████     | 246/483 [05:03<05:59,  1.52s/it]                                                 [2024-07-23 17:49:04] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.43.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16
+ 51%|█████     | 246/483 [05:03<05:59,  1.52s/it] 51%|█████▏    | 248/483 [05:04<04:31,  1.15s/it]                                                 [2024-07-23 17:49:05] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.43.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 51%|█████▏    | 248/483 [05:04<04:31,  1.15s/it]                                                 [2024-07-23 17:49:05] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.44.input_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 51%|█████▏    | 248/483 [05:04<04:31,  1.15s/it]                                                 [2024-07-23 17:49:05] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.44.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16
+ 51%|█████▏    | 248/483 [05:05<04:31,  1.15s/it] 52%|█████▏    | 251/483 [05:06<03:11,  1.21it/s]                                                 [2024-07-23 17:49:08] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.44.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16
+ 52%|█████▏    | 251/483 [05:07<03:11,  1.21it/s] 52%|█████▏    | 252/483 [05:09<04:53,  1.27s/it]                                                 [2024-07-23 17:49:09] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.44.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 52%|█████▏    | 252/483 [05:09<04:53,  1.27s/it] 52%|█████▏    | 253/483 [05:09<03:55,  1.02s/it]                                                 [2024-07-23 17:49:10] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.44.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16
+ 52%|█████▏    | 253/483 [05:09<03:55,  1.02s/it] 53%|█████▎    | 254/483 [05:09<03:28,  1.10it/s]                                                 [2024-07-23 17:49:10] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.44.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16
+ 53%|█████▎    | 254/483 [05:10<03:28,  1.10it/s] 53%|█████▎    | 255/483 [05:10<02:58,  1.28it/s]                                                 [2024-07-23 17:49:10] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.45.input_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 53%|█████▎    | 255/483 [05:10<02:58,  1.28it/s]                                                 [2024-07-23 17:49:11] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.45.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16
+ 53%|█████▎    | 255/483 [05:10<02:58,  1.28it/s] 53%|█████▎    | 257/483 [05:11<02:50,  1.33it/s]                                                 [2024-07-23 17:49:13] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.45.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16
+ 53%|���████▎    | 257/483 [05:13<02:50,  1.33it/s] 53%|█████▎    | 258/483 [05:14<04:55,  1.31s/it]                                                 [2024-07-23 17:49:15] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.45.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 53%|█████▎    | 258/483 [05:14<04:55,  1.31s/it] 54%|█████▎    | 259/483 [05:14<03:47,  1.01s/it]                                                 [2024-07-23 17:49:15] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.45.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16
+ 54%|█████▎    | 259/483 [05:15<03:47,  1.01s/it] 54%|█████▍    | 260/483 [05:15<03:17,  1.13it/s]                                                 [2024-07-23 17:49:16] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.45.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16
+ 54%|█████▍    | 260/483 [05:15<03:17,  1.13it/s] 54%|█████▍    | 261/483 [05:15<02:47,  1.32it/s]                                                 [2024-07-23 17:49:16] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.46.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16
+ 54%|█████▍    | 261/483 [05:16<02:47,  1.32it/s] 54%|█████▍    | 262/483 [05:16<02:32,  1.45it/s]                                                 [2024-07-23 17:49:17] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.46.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16
+ 54%|█████▍    | 262/483 [05:16<02:32,  1.45it/s] 54%|█████▍    | 263/483 [05:16<02:13,  1.64it/s]                                                 [2024-07-23 17:49:17] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00016-of-00030.safetensors
+ 54%|█████▍    | 263/483 [05:16<02:13,  1.64it/s]                                                 [2024-07-23 17:49:17] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00017-of-00030.safetensors
+ 54%|█████▍    | 263/483 [05:16<02:13,  1.64it/s]                                                 [2024-07-23 17:49:17] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00018-of-00030.safetensors
+ 54%|█████▍    | 263/483 [05:17<02:13,  1.64it/s]                                                 [2024-07-23 17:49:20] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.46.input_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 54%|█████▍    | 263/483 [05:19<02:13,  1.64it/s] 55%|█████▍    | 264/483 [05:19<04:12,  1.15s/it]                                                 [2024-07-23 17:49:20] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.46.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16
+ 55%|█████▍    | 264/483 [05:19<04:12,  1.15s/it] 55%|█████▍    | 265/483 [05:20<04:30,  1.24s/it]                                                 [2024-07-23 17:49:22] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.46.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16
+ 55%|█████▍    | 265/483 [05:22<04:30,  1.24s/it] 55%|█████▌    | 266/483 [05:23<06:34,  1.82s/it]                                                 [2024-07-23 17:49:24] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.46.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 55%|█████▌    | 266/483 [05:24<06:34,  1.82s/it] 55%|█████▌    | 267/483 [05:24<04:42,  1.31s/it]                                                 [2024-07-23 17:49:24] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.47.input_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 55%|█████▌    | 267/483 [05:24<04:42,  1.31s/it]                                                 [2024-07-23 17:49:25] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.47.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16
+ 55%|█████▌    | 267/483 [05:24<04:42,  1.31s/it] 56%|█████▌    | 269/483 [05:25<03:41,  1.04s/it]                                                 [2024-07-23 17:49:27] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.47.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16
+ 56%|█████▌    | 269/483 [05:26<03:41,  1.04s/it] 56%|█████▌    | 270/483 [05:28<05:33,  1.57s/it]                                                 [2024-07-23 17:49:29] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.47.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 56%|█████▌    | 270/483 [05:28<05:33,  1.57s/it] 56%|█████▌    | 271/483 [05:28<04:11,  1.19s/it]                                                 [2024-07-23 17:49:29] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.47.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16
+ 56%|█████▌    | 271/483 [05:29<04:11,  1.19s/it] 56%|█████▋    | 272/483 [05:29<03:33,  1.01s/it]                                                 [2024-07-23 17:49:30] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.47.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16
+ 56%|█████▋    | 272/483 [05:29<03:33,  1.01s/it] 57%|█████▋    | 273/483 [05:29<02:57,  1.18it/s]                                                 [2024-07-23 17:49:30] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.48.input_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 57%|█████▋    | 273/483 [05:29<02:57,  1.18it/s]                                                 [2024-07-23 17:49:31] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.48.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16
+ 57%|█████▋    | 273/483 [05:30<02:57,  1.18it/s] 57%|█████▋    | 275/483 [05:31<02:44,  1.26it/s]                                                 [2024-07-23 17:49:33] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.48.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16
+ 57%|█████▋    | 275/483 [05:32<02:44,  1.26it/s] 57%|█████▋    | 276/483 [05:34<04:44,  1.37s/it]                                                 [2024-07-23 17:49:35] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.48.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 57%|█████▋    | 276/483 [05:34<04:44,  1.37s/it] 57%|█████▋    | 277/483 [05:34<03:35,  1.05s/it]                                                 [2024-07-23 17:49:35] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.48.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16
+ 57%|█████▋    | 277/483 [05:34<03:35,  1.05s/it] 58%|█████▊    | 278/483 [05:35<03:07,  1.10it/s]                                                 [2024-07-23 17:49:35] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.48.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16
+ 58%|█████▊    | 278/483 [05:35<03:07,  1.10it/s] 58%|█████▊    | 279/483 [05:35<02:38,  1.29it/s]                                                 [2024-07-23 17:49:36] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.49.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16
+ 58%|█████▊    | 279/483 [05:35<02:38,  1.29it/s] 58%|█████▊    | 280/483 [05:36<02:24,  1.41it/s]                                                 [2024-07-23 17:49:36] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00018-of-00030.safetensors
+ 58%|█████▊    | 280/483 [05:36<02:24,  1.41it/s]                                                 [2024-07-23 17:49:36] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00019-of-00030.safetensors
+ 58%|█████▊    | 280/483 [05:36<02:24,  1.41it/s]                                                 [2024-07-23 17:49:39] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.49.input_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 58%|█████▊    | 280/483 [05:38<02:24,  1.41it/s] 58%|█████▊    | 281/483 [05:38<04:31,  1.34s/it]                                                 [2024-07-23 17:49:40] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.49.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16
+ 58%|█████▊    | 281/483 [05:39<04:31,  1.34s/it] 58%|█████▊    | 282/483 [05:40<04:36,  1.38s/it]                                                 [2024-07-23 17:49:42] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.49.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16
+ 58%|█████▊    | 282/483 [05:41<04:36,  1.38s/it] 59%|█████▊    | 283/483 [05:43<06:23,  1.92s/it]                                                 [2024-07-23 17:49:44] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.49.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 59%|█████▊    | 283/483 [05:43<06:23,  1.92s/it] 59%|█████▉    | 284/483 [05:43<04:34,  1.38s/it]                                                 [2024-07-23 17:49:44] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.49.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16
+ 59%|█████▉    | 284/483 [05:43<04:34,  1.38s/it] 59%|█████▉    | 285/483 [05:44<03:35,  1.09s/it]                                                 [2024-07-23 17:49:44] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.50.input_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 59%|█████▉    | 285/483 [05:44<03:35,  1.09s/it]                                                 [2024-07-23 17:49:45] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.50.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16
+ 59%|█████▉    | 285/483 [05:44<03:35,  1.09s/it] 59%|█████▉    | 287/483 [05:45<03:00,  1.09it/s]                                                 [2024-07-23 17:49:47] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.50.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16
+ 59%|█████▉    | 287/483 [05:46<03:00,  1.09it/s] 60%|█████▉    | 288/483 [05:48<04:50,  1.49s/it]                                                 [2024-07-23 17:49:49] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.50.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 60%|█████▉    | 288/483 [05:48<04:50,  1.49s/it] 60%|█████▉    | 289/483 [05:48<03:38,  1.13s/it]                                                 [2024-07-23 17:49:49] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.50.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16
+ 60%|█████▉    | 289/483 [05:49<03:38,  1.13s/it] 60%|██████    | 290/483 [05:49<03:06,  1.03it/s]                                                 [2024-07-23 17:49:50] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.50.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16
+ 60%|██████    | 290/483 [05:49<03:06,  1.03it/s] 60%|██████    | 291/483 [05:49<02:34,  1.24it/s]                                                 [2024-07-23 17:49:50] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.51.input_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 60%|██████    | 291/483 [05:49<02:34,  1.24it/s]                                                 [2024-07-23 17:49:51] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.51.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16
+ 60%|██████    | 291/483 [05:50<02:34,  1.24it/s] 61%|██████    | 293/483 [05:51<02:26,  1.30it/s]                                                 [2024-07-23 17:49:53] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.51.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16
+ 61%|██████    | 293/483 [05:52<02:26,  1.30it/s] 61%|██████    | 294/483 [05:54<04:14,  1.35s/it]                                                 [2024-07-23 17:49:55] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.51.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 61%|██████    | 294/483 [05:54<04:14,  1.35s/it] 61%|██████    | 295/483 [05:54<03:13,  1.03s/it]                                                 [2024-07-23 17:49:55] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.51.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16
+ 61%|██████    | 295/483 [05:54<03:13,  1.03s/it] 61%|██████▏   | 296/483 [05:55<02:47,  1.12it/s]                                                 [2024-07-23 17:49:55] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.51.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16
+ 61%|██████▏   | 296/483 [05:55<02:47,  1.12it/s] 61%|██████▏   | 297/483 [05:55<02:20,  1.32it/s]                                                 [2024-07-23 17:49:56] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00019-of-00030.safetensors
+ 61%|██████▏   | 297/483 [05:55<02:20,  1.32it/s]                                                 [2024-07-23 17:49:56] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00020-of-00030.safetensors
+ 61%|██████▏   | 297/483 [05:55<02:20,  1.32it/s]                                                 [2024-07-23 17:49:58] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.52.input_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 61%|██████▏   | 297/483 [05:57<02:20,  1.32it/s] 62%|██████▏   | 298/483 [05:57<03:49,  1.24s/it]                                                 [2024-07-23 17:49:59] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.52.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16
+ 62%|██████▏   | 298/483 [05:58<03:49,  1.24s/it] 62%|██████▏   | 299/483 [05:59<03:59,  1.30s/it]                                                 [2024-07-23 17:50:01] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.52.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16
+ 62%|██████▏   | 299/483 [06:00<03:59,  1.30s/it] 62%|██████▏   | 300/483 [06:02<05:38,  1.85s/it]                                                 [2024-07-23 17:50:03] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.52.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 62%|██████▏   | 300/483 [06:02<05:38,  1.85s/it] 62%|██████▏   | 301/483 [06:02<04:03,  1.34s/it]                                                 [2024-07-23 17:50:03] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.52.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16
+ 62%|██████▏   | 301/483 [06:02<04:03,  1.34s/it] 63%|██████▎   | 302/483 [06:03<03:18,  1.10s/it]                                                 [2024-07-23 17:50:04] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.52.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16
+ 63%|██████▎   | 302/483 [06:03<03:18,  1.10s/it] 63%|██████▎   | 303/483 [06:03<02:40,  1.12it/s]                                                 [2024-07-23 17:50:04] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.53.input_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 63%|██████▎   | 303/483 [06:03<02:40,  1.12it/s]                                                 [2024-07-23 17:50:04] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.53.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16
+ 63%|██████▎   | 303/483 [06:04<02:40,  1.12it/s] 63%|██████▎   | 305/483 [06:05<02:24,  1.23it/s]                                                 [2024-07-23 17:50:07] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.53.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16
+ 63%|██████▎   | 305/483 [06:06<02:24,  1.23it/s] 63%|██████▎   | 306/483 [06:08<04:07,  1.40s/it]                                                 [2024-07-23 17:50:08] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.53.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 63%|██████▎   | 306/483 [06:08<04:07,  1.40s/it] 64%|██████▎   | 307/483 [06:08<03:06,  1.06s/it]                                                 [2024-07-23 17:50:09] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.53.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16
+ 64%|██████▎   | 307/483 [06:08<03:06,  1.06s/it] 64%|██████▍   | 308/483 [06:08<02:40,  1.09it/s]                                                 [2024-07-23 17:50:09] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.53.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16
+ 64%|██████▍   | 308/483 [06:09<02:40,  1.09it/s] 64%|██████▍   | 309/483 [06:09<02:14,  1.29it/s]                                                 [2024-07-23 17:50:11] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.54.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16
+ 64%|██████▍   | 309/483 [06:10<02:14,  1.29it/s] 64%|██████▍   | 310/483 [06:12<04:10,  1.45s/it]                                                 [2024-07-23 17:50:13] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.54.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16
+ 64%|██████▍   | 310/483 [06:12<04:10,  1.45s/it] 64%|██████▍   | 311/483 [06:13<03:28,  1.21s/it]                                                 [2024-07-23 17:50:13] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.54.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16
+ 64%|██████▍   | 311/483 [06:13<03:28,  1.21s/it] 65%|██████▍   | 312/483 [06:13<02:46,  1.03it/s]                                                 [2024-07-23 17:50:14] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00020-of-00030.safetensors
+ 65%|██████▍   | 312/483 [06:13<02:46,  1.03it/s]                                                 [2024-07-23 17:50:14] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00021-of-00030.safetensors
+ 65%|██████▍   | 312/483 [06:13<02:46,  1.03it/s]                                                 [2024-07-23 17:50:16] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.54.input_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 65%|██████▍   | 312/483 [06:16<02:46,  1.03it/s] 65%|██████▍   | 313/483 [06:16<04:06,  1.45s/it]                                                 [2024-07-23 17:50:17] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.54.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16
+ 65%|██████▍   | 313/483 [06:16<04:06,  1.45s/it] 65%|██████▌   | 314/483 [06:17<04:05,  1.45s/it]                                                 [2024-07-23 17:50:18] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.54.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 65%|██████▌   | 314/483 [06:17<04:05,  1.45s/it]                                                 [2024-07-23 17:50:18] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.55.input_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 65%|██████▌   | 314/483 [06:17<04:05,  1.45s/it]                                                 [2024-07-23 17:50:18] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.55.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16
+ 65%|██████▌   | 314/483 [06:18<04:05,  1.45s/it] 66%|██████▌   | 317/483 [06:18<02:31,  1.10it/s]                                                 [2024-07-23 17:50:20] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.55.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16
+ 66%|██████▌   | 317/483 [06:20<02:31,  1.10it/s] 66%|██████▌   | 318/483 [06:22<03:49,  1.39s/it]                                                 [2024-07-23 17:50:22] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.55.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 66%|██████▌   | 318/483 [06:22<03:49,  1.39s/it] 66%|██████▌   | 319/483 [06:22<02:59,  1.09s/it]                                                 [2024-07-23 17:50:23] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.55.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16
+ 66%|██████▌   | 319/483 [06:22<02:59,  1.09s/it] 66%|██████▋   | 320/483 [06:22<02:35,  1.05it/s]                                                 [2024-07-23 17:50:23] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.55.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16
+ 66%|██████▋   | 320/483 [06:22<02:35,  1.05it/s] 66%|██████▋   | 321/483 [06:23<02:10,  1.24it/s]                                                 [2024-07-23 17:50:23] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.56.input_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 66%|██████▋   | 321/483 [06:23<02:10,  1.24it/s]                                                 [2024-07-23 17:50:24] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.56.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16
+ 66%|██████▋   | 321/483 [06:23<02:10,  1.24it/s] 67%|██████▋   | 323/483 [06:24<02:03,  1.30it/s]                                                 [2024-07-23 17:50:26] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.56.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16
+ 67%|██████▋   | 323/483 [06:25<02:03,  1.30it/s] 67%|██████▋   | 324/483 [06:27<03:32,  1.34s/it]                                                 [2024-07-23 17:50:28] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.56.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 67%|██████▋   | 324/483 [06:27<03:32,  1.34s/it] 67%|██████▋   | 325/483 [06:27<02:42,  1.03s/it]                                                 [2024-07-23 17:50:28] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.56.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16
+ 67%|██████▋   | 325/483 [06:28<02:42,  1.03s/it] 67%|██████▋   | 326/483 [06:28<02:20,  1.12it/s]                                                 [2024-07-23 17:50:29] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.56.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16
+ 67%|██████▋   | 326/483 [06:28<02:20,  1.12it/s] 68%|██████▊   | 327/483 [06:28<01:58,  1.32it/s]                                                 [2024-07-23 17:50:29] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00022-of-00030.safetensors
+ 68%|██████▊   | 327/483 [06:28<01:58,  1.32it/s]                                                 [2024-07-23 17:50:34] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.57.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16
+ 68%|██████▊   | 327/483 [06:33<01:58,  1.32it/s] 68%|██████▊   | 328/483 [06:35<06:17,  2.44s/it]                                                 [2024-07-23 17:50:36] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.57.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16
+ 68%|██████▊   | 328/483 [06:35<06:17,  2.44s/it] 68%|██████▊   | 329/483 [06:36<04:55,  1.92s/it]                                                 [2024-07-23 17:50:37] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.57.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16
+ 68%|██████▊   | 329/483 [06:36<04:55,  1.92s/it] 68%|██████▊   | 330/483 [06:36<03:46,  1.48s/it]                                                 [2024-07-23 17:50:37] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.57.input_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 68%|██████▊   | 330/483 [06:36<03:46,  1.48s/it]                                                 [2024-07-23 17:50:37] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.57.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16
+ 68%|██████▊   | 330/483 [06:37<03:46,  1.48s/it] 69%|██████▊   | 332/483 [06:38<02:51,  1.13s/it]                                                 [2024-07-23 17:50:38] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.57.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 69%|██████▊   | 332/483 [06:38<02:51,  1.13s/it]                                                 [2024-07-23 17:50:38] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.58.input_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 69%|██████▊   | 332/483 [06:38<02:51,  1.13s/it]                                                 [2024-07-23 17:50:39] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.58.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16
+ 69%|██████▊   | 332/483 [06:38<02:51,  1.13s/it] 69%|██████▉   | 335/483 [06:39<02:00,  1.23it/s]                                                 [2024-07-23 17:50:41] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.58.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16
+ 69%|██████▉   | 335/483 [06:40<02:00,  1.23it/s] 70%|██████▉   | 336/483 [06:42<03:06,  1.27s/it]                                                 [2024-07-23 17:50:43] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.58.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 70%|██████▉   | 336/483 [06:42<03:06,  1.27s/it] 70%|██████▉   | 337/483 [06:42<02:28,  1.02s/it]                                                 [2024-07-23 17:50:43] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.58.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16
+ 70%|██████▉   | 337/483 [06:42<02:28,  1.02s/it] 70%|██████▉   | 338/483 [06:43<02:10,  1.11it/s]                                                 [2024-07-23 17:50:44] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.58.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16
+ 70%|██████▉   | 338/483 [06:43<02:10,  1.11it/s] 70%|███████   | 339/483 [06:43<01:51,  1.29it/s]                                                 [2024-07-23 17:50:44] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.59.input_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 70%|███████   | 339/483 [06:43<01:51,  1.29it/s]                                                 [2024-07-23 17:50:44] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.59.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16
+ 70%|███████   | 339/483 [06:44<01:51,  1.29it/s] 71%|███████   | 341/483 [06:45<01:46,  1.33it/s]                                                 [2024-07-23 17:50:47] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.59.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16
+ 71%|███████   | 341/483 [06:46<01:46,  1.33it/s] 71%|███████   | 342/483 [06:48<03:04,  1.31s/it]                                                 [2024-07-23 17:50:49] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.59.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 71%|███████   | 342/483 [06:48<03:04,  1.31s/it] 71%|███████   | 343/483 [06:48<02:21,  1.01s/it]                                                 [2024-07-23 17:50:49] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.59.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16
+ 71%|███████   | 343/483 [06:48<02:21,  1.01s/it] 71%|███████   | 344/483 [06:48<02:03,  1.13it/s]                                                 [2024-07-23 17:50:49] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.59.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16
+ 71%|███████   | 344/483 [06:49<02:03,  1.13it/s] 71%|███████▏  | 345/483 [06:49<01:44,  1.33it/s]                                                 [2024-07-23 17:50:50] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.60.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16
+ 71%|███████▏  | 345/483 [06:49<01:44,  1.33it/s] 72%|███████▏  | 346/483 [06:49<01:34,  1.45it/s]                                                 [2024-07-23 17:50:50] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.60.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16
+ 72%|███████▏  | 346/483 [06:50<01:34,  1.45it/s] 72%|███████▏  | 347/483 [06:50<01:22,  1.65it/s]                                                 [2024-07-23 17:50:50] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00021-of-00030.safetensors
+ 72%|███████▏  | 347/483 [06:50<01:22,  1.65it/s]                                                 [2024-07-23 17:50:51] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00022-of-00030.safetensors
+ 72%|███████▏  | 347/483 [06:50<01:22,  1.65it/s]                                                 [2024-07-23 17:50:51] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00023-of-00030.safetensors
+ 72%|███████▏  | 347/483 [06:50<01:22,  1.65it/s]                                                 [2024-07-23 17:50:53] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.60.input_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 72%|███████▏  | 347/483 [06:52<01:22,  1.65it/s] 72%|███████▏  | 348/483 [06:52<02:41,  1.20s/it]                                                 [2024-07-23 17:50:54] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.60.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16
+ 72%|███████▏  | 348/483 [06:53<02:41,  1.20s/it] 72%|███████▏  | 349/483 [06:54<02:50,  1.27s/it]                                                 [2024-07-23 17:50:56] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.60.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16
+ 72%|███████▏  | 349/483 [06:55<02:50,  1.27s/it] 72%|███████▏  | 350/483 [06:57<04:04,  1.84s/it]                                                 [2024-07-23 17:50:58] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.60.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 72%|███████▏  | 350/483 [06:57<04:04,  1.84s/it] 73%|███████▎  | 351/483 [06:57<02:54,  1.32s/it]                                                 [2024-07-23 17:50:58] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.61.input_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 73%|███████▎  | 351/483 [06:57<02:54,  1.32s/it]                                                 [2024-07-23 17:50:58] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.61.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16
+ 73%|███████▎  | 351/483 [06:58<02:54,  1.32s/it] 73%|███████▎  | 353/483 [06:59<02:15,  1.04s/it]                                                 [2024-07-23 17:51:01] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.61.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16
+ 73%|███████▎  | 353/483 [07:00<02:15,  1.04s/it] 73%|███████▎  | 354/483 [07:02<03:22,  1.57s/it]                                                 [2024-07-23 17:51:02] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.61.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 73%|███████▎  | 354/483 [07:02<03:22,  1.57s/it] 73%|███████▎  | 355/483 [07:02<02:32,  1.19s/it]                                                 [2024-07-23 17:51:03] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.61.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16
+ 73%|███████▎  | 355/483 [07:02<02:32,  1.19s/it] 74%|███████▎  | 356/483 [07:02<02:08,  1.01s/it]                                                 [2024-07-23 17:51:03] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.61.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16
+ 74%|███████▎  | 356/483 [07:03<02:08,  1.01s/it] 74%|███████▍  | 357/483 [07:03<01:46,  1.18it/s]                                                 [2024-07-23 17:51:04] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.62.input_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 74%|███████▍  | 357/483 [07:03<01:46,  1.18it/s]                                                 [2024-07-23 17:51:04] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.62.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16
+ 74%|███████▍  | 357/483 [07:03<01:46,  1.18it/s] 74%|███████▍  | 359/483 [07:04<01:37,  1.27it/s]                                                 [2024-07-23 17:51:06] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.62.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16
+ 74%|███████▍  | 359/483 [07:06<01:37,  1.27it/s] 75%|███████▍  | 360/483 [07:07<02:48,  1.37s/it]                                                 [2024-07-23 17:51:08] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.62.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 75%|███████▍  | 360/483 [07:08<02:48,  1.37s/it] 75%|███████▍  | 361/483 [07:08<02:07,  1.04s/it]                                                 [2024-07-23 17:51:08] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.62.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16
+ 75%|███████▍  | 361/483 [07:08<02:07,  1.04s/it] 75%|███████▍  | 362/483 [07:08<01:49,  1.10it/s]                                                 [2024-07-23 17:51:09] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.62.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16
+ 75%|███████▍  | 362/483 [07:08<01:49,  1.10it/s] 75%|███████▌  | 363/483 [07:09<01:32,  1.29it/s]                                                 [2024-07-23 17:51:09] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.63.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16
+ 75%|███████▌  | 363/483 [07:09<01:32,  1.29it/s] 75%|███████▌  | 364/483 [07:09<01:24,  1.41it/s]                                                 [2024-07-23 17:51:10] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00023-of-00030.safetensors
+ 75%|███████▌  | 364/483 [07:09<01:24,  1.41it/s]                                                 [2024-07-23 17:51:10] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00024-of-00030.safetensors
+ 75%|███████▌  | 364/483 [07:09<01:24,  1.41it/s]                                                 [2024-07-23 17:51:13] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.63.input_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 75%|███████▌  | 364/483 [07:12<01:24,  1.41it/s] 76%|███████▌  | 365/483 [07:12<02:39,  1.35s/it]                                                 [2024-07-23 17:51:13] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.63.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16
+ 76%|███████▌  | 365/483 [07:13<02:39,  1.35s/it] 76%|███████▌  | 366/483 [07:13<02:41,  1.38s/it]                                                 [2024-07-23 17:51:15] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.63.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16
+ 76%|███████▌  | 366/483 [07:15<02:41,  1.38s/it] 76%|███████▌  | 367/483 [07:17<03:42,  1.92s/it]                                                 [2024-07-23 17:51:17] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.63.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 76%|███████▌  | 367/483 [07:17<03:42,  1.92s/it] 76%|███████▌  | 368/483 [07:17<02:38,  1.38s/it]                                                 [2024-07-23 17:51:18] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.63.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16
+ 76%|███████▌  | 368/483 [07:17<02:38,  1.38s/it] 76%|███████▋  | 369/483 [07:17<02:04,  1.09s/it]                                                 [2024-07-23 17:51:18] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.64.input_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 76%|███████▋  | 369/483 [07:17<02:04,  1.09s/it]                                                 [2024-07-23 17:51:18] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.64.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16
+ 76%|███████▋  | 369/483 [07:18<02:04,  1.09s/it] 77%|███████▋  | 371/483 [07:19<01:42,  1.09it/s]                                                 [2024-07-23 17:51:21] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.64.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16
+ 77%|███████▋  | 371/483 [07:20<01:42,  1.09it/s] 77%|███████▋  | 372/483 [07:22<02:44,  1.48s/it]                                                 [2024-07-23 17:51:23] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.64.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 77%|███████▋  | 372/483 [07:22<02:44,  1.48s/it] 77%|███████▋  | 373/483 [07:22<02:03,  1.12s/it]                                                 [2024-07-23 17:51:23] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.64.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16
+ 77%|███████▋  | 373/483 [07:22<02:03,  1.12s/it] 77%|███████▋  | 374/483 [07:22<01:44,  1.04it/s]                                                 [2024-07-23 17:51:23] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.64.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16
+ 77%|███████▋  | 374/483 [07:23<01:44,  1.04it/s] 78%|███████▊  | 375/483 [07:23<01:26,  1.25it/s]                                                 [2024-07-23 17:51:23] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.65.input_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 78%|███████▊  | 375/483 [07:23<01:26,  1.25it/s]                                                 [2024-07-23 17:51:24] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.65.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16
+ 78%|███████▊  | 375/483 [07:23<01:26,  1.25it/s] 78%|███████▊  | 377/483 [07:24<01:21,  1.31it/s]                                                 [2024-07-23 17:51:26] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.65.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16
+ 78%|███████▊  | 377/483 [07:26<01:21,  1.31it/s] 78%|███████▊  | 378/483 [07:27<02:21,  1.35s/it]                                                 [2024-07-23 17:51:28] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.65.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 78%|███████▊  | 378/483 [07:27<02:21,  1.35s/it] 78%|███████▊  | 379/483 [07:28<01:46,  1.03s/it]                                                 [2024-07-23 17:51:28] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.65.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16
+ 78%|███████▊  | 379/483 [07:28<01:46,  1.03s/it] 79%|███████▊  | 380/483 [07:28<01:32,  1.12it/s]                                                 [2024-07-23 17:51:29] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.65.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16
+ 79%|███████▊  | 380/483 [07:28<01:32,  1.12it/s] 79%|███████▉  | 381/483 [07:28<01:17,  1.32it/s]                                                 [2024-07-23 17:51:29] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00024-of-00030.safetensors
+ 79%|███████▉  | 381/483 [07:28<01:17,  1.32it/s]                                                 [2024-07-23 17:51:29] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00025-of-00030.safetensors
+ 79%|███████▉  | 381/483 [07:29<01:17,  1.32it/s]                                                 [2024-07-23 17:51:31] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.66.input_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 79%|███████▉  | 381/483 [07:31<01:17,  1.32it/s] 79%|███████▉  | 382/483 [07:31<01:56,  1.15s/it]                                                 [2024-07-23 17:51:32] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.66.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16
+ 79%|███████▉  | 382/483 [07:31<01:56,  1.15s/it] 79%|███████▉  | 383/483 [07:32<02:03,  1.24s/it]                                                 [2024-07-23 17:51:34] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.66.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16
+ 79%|███████▉  | 383/483 [07:33<02:03,  1.24s/it] 80%|███████▉  | 384/483 [07:35<02:58,  1.80s/it]                                                 [2024-07-23 17:51:36] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.66.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 80%|███████▉  | 384/483 [07:35<02:58,  1.80s/it] 80%|███████▉  | 385/483 [07:35<02:07,  1.30s/it]                                                 [2024-07-23 17:51:36] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.66.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16
+ 80%|███████▉  | 385/483 [07:36<02:07,  1.30s/it] 80%|███████▉  | 386/483 [07:36<01:44,  1.07s/it]                                                 [2024-07-23 17:51:37] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.66.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16
+ 80%|███████▉  | 386/483 [07:36<01:44,  1.07s/it] 80%|████████  | 387/483 [07:36<01:23,  1.14it/s]                                                 [2024-07-23 17:51:37] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.67.input_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 80%|████████  | 387/483 [07:36<01:23,  1.14it/s]                                                 [2024-07-23 17:51:38] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.67.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16
+ 80%|████████  | 387/483 [07:37<01:23,  1.14it/s] 81%|████████  | 389/483 [07:38<01:15,  1.25it/s]                                                 [2024-07-23 17:51:40] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.67.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16
+ 81%|████████  | 389/483 [07:39<01:15,  1.25it/s] 81%|████████  | 390/483 [07:41<02:09,  1.39s/it]                                                 [2024-07-23 17:51:42] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.67.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 81%|████████  | 390/483 [07:41<02:09,  1.39s/it] 81%|████████  | 391/483 [07:41<01:37,  1.06s/it]                                                 [2024-07-23 17:51:42] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.67.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16
+ 81%|████████  | 391/483 [07:41<01:37,  1.06s/it] 81%|████████  | 392/483 [07:42<01:23,  1.09it/s]                                                 [2024-07-23 17:51:42] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.67.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16
+ 81%|████████  | 392/483 [07:42<01:23,  1.09it/s] 81%|████████▏ | 393/483 [07:42<01:09,  1.30it/s]                                                 [2024-07-23 17:51:44] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.68.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16
+ 81%|████████▏ | 393/483 [07:43<01:09,  1.30it/s] 82%|████████▏ | 394/483 [07:45<02:09,  1.46s/it]                                                 [2024-07-23 17:51:46] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.68.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16
+ 82%|████████▏ | 394/483 [07:45<02:09,  1.46s/it] 82%|████████▏ | 395/483 [07:46<01:47,  1.22s/it]                                                 [2024-07-23 17:51:47] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.68.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16
+ 82%|████████▏ | 395/483 [07:46<01:47,  1.22s/it] 82%|████████▏ | 396/483 [07:46<01:25,  1.02it/s]                                                 [2024-07-23 17:51:47] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00025-of-00030.safetensors
+ 82%|████████▏ | 396/483 [07:46<01:25,  1.02it/s]                                                 [2024-07-23 17:51:47] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00026-of-00030.safetensors
+ 82%|████████▏ | 396/483 [07:46<01:25,  1.02it/s]                                                 [2024-07-23 17:51:49] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.68.input_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 82%|████████▏ | 396/483 [07:49<01:25,  1.02it/s] 82%|████████▏ | 397/483 [07:49<02:05,  1.45s/it]                                                 [2024-07-23 17:51:50] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.68.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16
+ 82%|████████▏ | 397/483 [07:49<02:05,  1.45s/it] 82%|████████▏ | 398/483 [07:50<02:03,  1.45s/it]                                                 [2024-07-23 17:51:51] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.68.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 82%|████████▏ | 398/483 [07:50<02:03,  1.45s/it]                                                 [2024-07-23 17:51:51] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.69.input_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 82%|████████▏ | 398/483 [07:50<02:03,  1.45s/it]                                                 [2024-07-23 17:51:51] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.69.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16
+ 82%|████████▏ | 398/483 [07:51<02:03,  1.45s/it] 83%|████████▎ | 401/483 [07:52<01:14,  1.10it/s]                                                 [2024-07-23 17:51:54] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.69.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16
+ 83%|████████▎ | 401/483 [07:53<01:14,  1.10it/s] 83%|████████▎ | 402/483 [07:55<01:52,  1.39s/it]                                                 [2024-07-23 17:51:56] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.69.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 83%|████████▎ | 402/483 [07:55<01:52,  1.39s/it] 83%|████████▎ | 403/483 [07:55<01:27,  1.09s/it]                                                 [2024-07-23 17:51:56] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.69.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16
+ 83%|████████▎ | 403/483 [07:55<01:27,  1.09s/it] 84%|████████▎ | 404/483 [07:55<01:15,  1.05it/s]                                                 [2024-07-23 17:51:56] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.69.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16
+ 84%|████████▎ | 404/483 [07:56<01:15,  1.05it/s] 84%|████████▍ | 405/483 [07:56<01:03,  1.24it/s]                                                 [2024-07-23 17:51:57] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.70.input_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 84%|████████▍ | 405/483 [07:56<01:03,  1.24it/s]                                                 [2024-07-23 17:51:57] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.70.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16
+ 84%|████████▍ | 405/483 [07:56<01:03,  1.24it/s] 84%|████████▍ | 407/483 [07:57<00:58,  1.30it/s]                                                 [2024-07-23 17:51:59] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.70.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16
+ 84%|████████▍ | 407/483 [07:59<00:58,  1.30it/s] 84%|████████▍ | 408/483 [08:01<01:40,  1.34s/it]                                                 [2024-07-23 17:52:01] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.70.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 84%|████████▍ | 408/483 [08:01<01:40,  1.34s/it] 85%|████████▍ | 409/483 [08:01<01:16,  1.03s/it]                                                 [2024-07-23 17:52:02] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.70.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16
+ 85%|████████▍ | 409/483 [08:01<01:16,  1.03s/it] 85%|████████▍ | 410/483 [08:01<01:05,  1.11it/s]                                                 [2024-07-23 17:52:02] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.70.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16
+ 85%|████████▍ | 410/483 [08:01<01:05,  1.11it/s] 85%|████████▌ | 411/483 [08:02<00:54,  1.31it/s]                                                 [2024-07-23 17:52:02] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00027-of-00030.safetensors
+ 85%|████████▌ | 411/483 [08:02<00:54,  1.31it/s]                                                 [2024-07-23 17:52:08] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.71.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16
+ 85%|████████▌ | 411/483 [08:07<00:54,  1.31it/s] 85%|████████▌ | 412/483 [08:09<03:03,  2.58s/it]                                                 [2024-07-23 17:52:10] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.71.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16
+ 85%|████████▌ | 412/483 [08:09<03:03,  2.58s/it] 86%|████████▌ | 413/483 [08:09<02:21,  2.02s/it]                                                 [2024-07-23 17:52:10] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.71.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16
+ 86%|████████▌ | 413/483 [08:10<02:21,  2.02s/it] 86%|████████▌ | 414/483 [08:10<01:47,  1.55s/it]                                                 [2024-07-23 17:52:10] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00026-of-00030.safetensors
+ 86%|████████▌ | 414/483 [08:10<01:47,  1.55s/it]                                                 [2024-07-23 17:52:11] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00027-of-00030.safetensors
+ 86%|████████▌ | 414/483 [08:10<01:47,  1.55s/it]                                                 [2024-07-23 17:52:11] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00004-of-00030.safetensors
+ 86%|████████▌ | 414/483 [08:10<01:47,  1.55s/it]                                                 [2024-07-23 17:52:13] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.7.input_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 86%|████████▌ | 414/483 [08:12<01:47,  1.55s/it] 86%|████████▌ | 415/483 [08:12<02:04,  1.84s/it]                                                 [2024-07-23 17:52:14] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.7.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16
+ 86%|████████▌ | 415/483 [08:13<02:04,  1.84s/it] 86%|████████▌ | 416/483 [08:14<01:55,  1.72s/it]                                                 [2024-07-23 17:52:16] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.7.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16
+ 86%|████████▌ | 416/483 [08:15<01:55,  1.72s/it] 86%|████████▋ | 417/483 [08:17<02:21,  2.15s/it]                                                 [2024-07-23 17:52:18] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.7.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 86%|████████▋ | 417/483 [08:17<02:21,  2.15s/it] 87%|████████▋ | 418/483 [08:17<01:40,  1.54s/it]                                                 [2024-07-23 17:52:18] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.7.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16
+ 87%|████████▋ | 418/483 [08:17<01:40,  1.54s/it] 87%|████████▋ | 419/483 [08:17<01:17,  1.21s/it]                                                 [2024-07-23 17:52:18] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.8.input_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 87%|████████▋ | 419/483 [08:17<01:17,  1.21s/it]                                                 [2024-07-23 17:52:19] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.8.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16
+ 87%|████████▋ | 419/483 [08:18<01:17,  1.21s/it] 87%|████████▋ | 421/483 [08:19<01:00,  1.02it/s]                                                 [2024-07-23 17:52:21] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.8.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16
+ 87%|████████▋ | 421/483 [08:20<01:00,  1.02it/s] 87%|████████▋ | 422/483 [08:22<01:33,  1.53s/it]                                                 [2024-07-23 17:52:23] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.8.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 87%|████████▋ | 422/483 [08:22<01:33,  1.53s/it] 88%|████████▊ | 423/483 [08:22<01:09,  1.16s/it]                                                 [2024-07-23 17:52:23] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.8.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16
+ 88%|████████▊ | 423/483 [08:22<01:09,  1.16s/it] 88%|████████▊ | 424/483 [08:23<00:58,  1.02it/s]                                                 [2024-07-23 17:52:24] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.8.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16
+ 88%|████████▊ | 424/483 [08:23<00:58,  1.02it/s] 88%|████████▊ | 425/483 [08:23<00:47,  1.22it/s]                                                 [2024-07-23 17:52:24] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.9.input_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 88%|████████▊ | 425/483 [08:23<00:47,  1.22it/s]                                                 [2024-07-23 17:52:24] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.9.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16
+ 88%|████████▊ | 425/483 [08:24<00:47,  1.22it/s] 88%|████████▊ | 427/483 [08:25<00:43,  1.29it/s]                                                 [2024-07-23 17:52:27] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.9.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16
+ 88%|████████▊ | 427/483 [08:26<00:43,  1.29it/s] 89%|████████▊ | 428/483 [08:28<01:14,  1.36s/it]                                                 [2024-07-23 17:52:28] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.9.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 89%|████████▊ | 428/483 [08:28<01:14,  1.36s/it] 89%|████████▉ | 429/483 [08:28<00:56,  1.04s/it]                                                 [2024-07-23 17:52:29] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.9.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16
+ 89%|████████▉ | 429/483 [08:28<00:56,  1.04s/it] 89%|████████▉ | 430/483 [08:28<00:47,  1.11it/s]                                                 [2024-07-23 17:52:29] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.9.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16
+ 89%|████████▉ | 430/483 [08:29<00:47,  1.11it/s] 89%|████████▉ | 431/483 [08:29<00:39,  1.31it/s]                                                 [2024-07-23 17:52:29] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00004-of-00030.safetensors
+ 89%|████████▉ | 431/483 [08:29<00:39,  1.31it/s]                                                 [2024-07-23 17:52:30] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00027-of-00030.safetensors
+ 89%|████████▉ | 431/483 [08:29<00:39,  1.31it/s]                                                 [2024-07-23 17:52:31] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.71.input_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 89%|████████▉ | 431/483 [08:30<00:39,  1.31it/s] 89%|████████▉ | 432/483 [08:30<00:48,  1.06it/s]                                                 [2024-07-23 17:52:31] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.71.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16
+ 89%|████████▉ | 432/483 [08:31<00:48,  1.06it/s] 90%|████████▉ | 433/483 [08:32<00:54,  1.09s/it]                                                 [2024-07-23 17:52:32] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.71.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 90%|████████▉ | 433/483 [08:32<00:54,  1.09s/it]                                                 [2024-07-23 17:52:32] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.72.input_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 90%|████████▉ | 433/483 [08:32<00:54,  1.09s/it]                                                 [2024-07-23 17:52:33] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.72.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16
+ 90%|████████▉ | 433/483 [08:32<00:54,  1.09s/it] 90%|█████████ | 436/483 [08:33<00:35,  1.32it/s]                                                 [2024-07-23 17:52:35] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.72.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16
+ 90%|█████████ | 436/483 [08:34<00:35,  1.32it/s] 90%|█████████ | 437/483 [08:36<00:57,  1.26s/it]                                                 [2024-07-23 17:52:37] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.72.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 90%|█████████ | 437/483 [08:36<00:57,  1.26s/it] 91%|█████████ | 438/483 [08:36<00:44,  1.01it/s]                                                 [2024-07-23 17:52:37] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.72.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16
+ 91%|█████████ | 438/483 [08:37<00:44,  1.01it/s] 91%|█████████ | 439/483 [08:37<00:38,  1.14it/s]                                                 [2024-07-23 17:52:38] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.72.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16
+ 91%|█████████ | 439/483 [08:37<00:38,  1.14it/s] 91%|█████████ | 440/483 [08:37<00:32,  1.33it/s]                                                 [2024-07-23 17:52:38] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.73.input_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 91%|█████████ | 440/483 [08:37<00:32,  1.33it/s]                                                 [2024-07-23 17:52:39] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.73.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16
+ 91%|█████████ | 440/483 [08:38<00:32,  1.33it/s] 92%|█████████▏| 442/483 [08:39<00:30,  1.35it/s]                                                 [2024-07-23 17:52:41] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.73.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16
+ 92%|█████████▏| 442/483 [08:40<00:30,  1.35it/s] 92%|█████████▏| 443/483 [08:42<00:52,  1.31s/it]                                                 [2024-07-23 17:52:43] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.73.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 92%|█████████▏| 443/483 [08:42<00:52,  1.31s/it] 92%|█████████▏| 444/483 [08:42<00:39,  1.01s/it]                                                 [2024-07-23 17:52:43] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.73.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16
+ 92%|█████████▏| 444/483 [08:42<00:39,  1.01s/it] 92%|█████████▏| 445/483 [08:43<00:33,  1.13it/s]                                                 [2024-07-23 17:52:43] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.73.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16
+ 92%|█████████▏| 445/483 [08:43<00:33,  1.13it/s] 92%|█████████▏| 446/483 [08:43<00:27,  1.33it/s]                                                 [2024-07-23 17:52:44] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.74.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16
+ 92%|█████████▏| 446/483 [08:43<00:27,  1.33it/s] 93%|█████████▎| 447/483 [08:43<00:24,  1.46it/s]                                                 [2024-07-23 17:52:44] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.74.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16
+ 93%|█████████▎| 447/483 [08:44<00:24,  1.46it/s] 93%|█████████▎| 448/483 [08:44<00:21,  1.65it/s]                                                 [2024-07-23 17:52:45] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00027-of-00030.safetensors
+ 93%|█████████▎| 448/483 [08:44<00:21,  1.65it/s]                                                 [2024-07-23 17:52:45] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00028-of-00030.safetensors
+ 93%|█████████▎| 448/483 [08:44<00:21,  1.65it/s]                                                 [2024-07-23 17:52:47] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.74.input_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 93%|█████████▎| 448/483 [08:47<00:21,  1.65it/s] 93%|█████████▎| 449/483 [08:47<00:43,  1.28s/it]                                                 [2024-07-23 17:52:48] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.74.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16
+ 93%|█████████▎| 449/483 [08:47<00:43,  1.28s/it] 93%|█████████▎| 450/483 [08:48<00:43,  1.33s/it]                                                 [2024-07-23 17:52:50] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.74.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16
+ 93%|█████████▎| 450/483 [08:50<00:43,  1.33s/it] 93%|█████████▎| 451/483 [08:51<01:00,  1.89s/it]                                                 [2024-07-23 17:52:52] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.74.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 93%|█████████▎| 451/483 [08:52<01:00,  1.89s/it] 94%|█████████▎| 452/483 [08:52<00:42,  1.36s/it]                                                 [2024-07-23 17:52:52] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.75.input_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 94%|█████████▎| 452/483 [08:52<00:42,  1.36s/it]                                                 [2024-07-23 17:52:53] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.75.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16
+ 94%|█████████▎| 452/483 [08:52<00:42,  1.36s/it] 94%|█████████▍| 454/483 [08:53<00:30,  1.06s/it]                                                 [2024-07-23 17:52:55] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.75.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16
+ 94%|█████████▍| 454/483 [08:54<00:30,  1.06s/it] 94%|█████████▍| 455/483 [08:56<00:44,  1.58s/it]                                                 [2024-07-23 17:52:57] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.75.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 94%|█████████▍| 455/483 [08:56<00:44,  1.58s/it] 94%|█████████▍| 456/483 [08:56<00:32,  1.20s/it]                                                 [2024-07-23 17:52:57] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.75.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16
+ 94%|█████████▍| 456/483 [08:57<00:32,  1.20s/it] 95%|█████████▍| 457/483 [08:57<00:26,  1.02s/it]                                                 [2024-07-23 17:52:58] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.75.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16
+ 95%|█████████▍| 457/483 [08:57<00:26,  1.02s/it] 95%|█████████▍| 458/483 [08:57<00:21,  1.17it/s]                                                 [2024-07-23 17:52:58] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.76.input_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 95%|█████████▍| 458/483 [08:57<00:21,  1.17it/s]                                                 [2024-07-23 17:52:58] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.76.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16
+ 95%|█████████▍| 458/483 [08:58<00:21,  1.17it/s] 95%|█████████▌| 460/483 [08:59<00:18,  1.26it/s]                                                 [2024-07-23 17:53:01] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.76.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16
+ 95%|█████████▌| 460/483 [09:00<00:18,  1.26it/s] 95%|█████████▌| 461/483 [09:02<00:30,  1.37s/it]                                                 [2024-07-23 17:53:03] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.76.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 95%|█████████▌| 461/483 [09:02<00:30,  1.37s/it] 96%|█████████▌| 462/483 [09:02<00:21,  1.05s/it]                                                 [2024-07-23 17:53:03] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.76.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16
+ 96%|█████████▌| 462/483 [09:02<00:21,  1.05s/it] 96%|█████████▌| 463/483 [09:03<00:18,  1.10it/s]                                                 [2024-07-23 17:53:03] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.76.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16
+ 96%|█████████▌| 463/483 [09:03<00:18,  1.10it/s] 96%|█████████▌| 464/483 [09:03<00:14,  1.29it/s]                                                 [2024-07-23 17:53:04] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.77.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16
+ 96%|█████████▌| 464/483 [09:03<00:14,  1.29it/s] 96%|█████████▋| 465/483 [09:03<00:12,  1.41it/s]                                                 [2024-07-23 17:53:04] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00028-of-00030.safetensors
+ 96%|█████████▋| 465/483 [09:03<00:12,  1.41it/s]                                                 [2024-07-23 17:53:04] INFO huggingface_loader.py:185: Loading HF parameters from: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00029-of-00030.safetensors
+ 96%|█████████▋| 465/483 [09:04<00:12,  1.41it/s]                                                 [2024-07-23 17:53:07] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.77.input_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 96%|█████████▋| 465/483 [09:06<00:12,  1.41it/s] 96%|█████████▋| 466/483 [09:06<00:23,  1.36s/it]                                                 [2024-07-23 17:53:08] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.77.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16
+ 96%|█████████▋| 466/483 [09:07<00:23,  1.36s/it] 97%|█████████▋| 467/483 [09:08<00:22,  1.39s/it]                                                 [2024-07-23 17:53:10] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.77.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16
+ 97%|█████████▋| 467/483 [09:09<00:22,  1.39s/it] 97%|█████████▋| 468/483 [09:11<00:28,  1.92s/it]                                                 [2024-07-23 17:53:12] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.77.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 97%|█████████▋| 468/483 [09:11<00:28,  1.92s/it] 97%|█████████▋| 469/483 [09:11<00:19,  1.38s/it]                                                 [2024-07-23 17:53:12] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.77.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16
+ 97%|█████████▋| 469/483 [09:11<00:19,  1.38s/it] 97%|█████████▋| 470/483 [09:12<00:14,  1.09s/it]                                                 [2024-07-23 17:53:12] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.78.input_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 97%|█████████▋| 470/483 [09:12<00:14,  1.09s/it]                                                 [2024-07-23 17:53:13] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.78.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16
+ 97%|█████████▋| 470/483 [09:12<00:14,  1.09s/it] 98%|█████████▊| 472/483 [09:13<00:10,  1.09it/s]                                                 [2024-07-23 17:53:15] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.78.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16
+ 98%|█████████▊| 472/483 [09:14<00:10,  1.09it/s] 98%|█████████▊| 473/483 [09:16<00:14,  1.48s/it]                                                 [2024-07-23 17:53:17] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.78.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 98%|█████████▊| 473/483 [09:16<00:14,  1.48s/it] 98%|█████████▊| 474/483 [09:16<00:10,  1.12s/it]                                                 [2024-07-23 17:53:17] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.78.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16
+ 98%|█████████▊| 474/483 [09:17<00:10,  1.12s/it] 98%|█████████▊| 475/483 [09:17<00:07,  1.04it/s]                                                 [2024-07-23 17:53:18] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.78.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16
+ 98%|█████████▊| 475/483 [09:17<00:07,  1.04it/s] 99%|█████████▊| 476/483 [09:17<00:05,  1.25it/s]                                                 [2024-07-23 17:53:18] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.79.input_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 99%|█████████▊| 476/483 [09:17<00:05,  1.25it/s]                                                 [2024-07-23 17:53:19] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.79.mlp.down_proj.weight[0m", shape: (8192, 28672), dtype: float16
+ 99%|█████████▊| 476/483 [09:18<00:05,  1.25it/s] 99%|█████████▉| 478/483 [09:19<00:03,  1.30it/s]                                                 [2024-07-23 17:53:21] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.79.mlp.gate_up_proj.weight[0m", shape: (57344, 8192), dtype: float16
+ 99%|█████████▉| 478/483 [09:20<00:03,  1.30it/s] 99%|█████████▉| 479/483 [09:22<00:05,  1.35s/it]                                                 [2024-07-23 17:53:23] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.79.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16
+ 99%|█████████▉| 479/483 [09:22<00:05,  1.35s/it] 99%|█████████▉| 480/483 [09:22<00:03,  1.03s/it]                                                 [2024-07-23 17:53:23] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.79.self_attn.qkv_proj.weight[0m", shape: (10240, 8192), dtype: float16
+ 99%|█████████▉| 480/483 [09:22<00:03,  1.03s/it]100%|█████████▉| 481/483 [09:23<00:01,  1.11it/s]                                                 [2024-07-23 17:53:23] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.layers.79.self_attn.o_proj.weight[0m", shape: (8192, 8192), dtype: float16
+100%|█████████▉| 481/483 [09:23<00:01,  1.11it/s]100%|█████████▉| 482/483 [09:23<00:00,  1.32it/s]                                                 [2024-07-23 17:53:24] INFO huggingface_loader.py:175: [Not quantized] Parameter: "[1mmodel.norm.weight[0m", shape: (8192,), dtype: float16
+100%|█████████▉| 482/483 [09:23<00:00,  1.32it/s]100%|██████████| 483/483 [09:23<00:00,  1.17s/it]
+[2024-07-23 17:53:24] INFO huggingface_loader.py:197: Unloading HF weight file: /Users/Shared/models/Meta-Llama-3.1-70B-Instruct/model-00029-of-00030.safetensors
+[2024-07-23 17:53:24] INFO stats.py:77: [92mTime usage[0m: HF loading: 82.243 sec; Pre-quantization mapping: 178.396 sec; Quantization: 0.000 sec
+[2024-07-23 17:53:24] INFO stats.py:91: [92mRAM usage[0m: Peak RAM: 17.375 GB. Total bytes loaded from disk: 271.521 GB
+[2024-07-23 17:53:24] INFO convert_weight.py:155: [92mParameter size[0m after quantization: 131.417 GB
+[2024-07-23 17:53:24] INFO convert_weight.py:160: [92mTotal parameters[0m: 72,885,788,672
+[2024-07-23 17:53:24] INFO convert_weight.py:161: [92mBits per parameter[0m: 15.488
+[2024-07-23 17:53:24] INFO convert_weight.py:166: Saved to directory: [1mlocal_dir/Llama-3.1-70B-Instruct-q0f16-MLC[0m
+
+All finished, 323 total shards committed, record saved to local_dir/Llama-3.1-70B-Instruct-q0f16-MLC/ndarray-cache.json