SPRINGLab/F5-Hindi-24KHz · getting this error please help

vocab : C:\Users\simra\Downloads\vocab.txt
token : custom
model : C:\Users\simra\Downloads\model_2500000.safetensors

Traceback (most recent call last):
File "C:\Users\simra.conda\envs\f5-tts\lib\site-packages\gradio\queueing.py", line 624, in process_events
response = await route_utils.call_process_api(
File "C:\Users\simra.conda\envs\f5-tts\lib\site-packages\gradio\route_utils.py", line 323, in call_process_api
output = await app.get_blocks().process_api(
File "C:\Users\simra.conda\envs\f5-tts\lib\site-packages\gradio\blocks.py", line 2016, in process_api
result = await self.call_function(
File "C:\Users\simra.conda\envs\f5-tts\lib\site-packages\gradio\blocks.py", line 1563, in call_function
prediction = await anyio.to_thread.run_sync( # type: ignore
File "C:\Users\simra.conda\envs\f5-tts\lib\site-packages\anyio\to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "C:\Users\simra.conda\envs\f5-tts\lib\site-packages\anyio_backends_asyncio.py", line 2441, in run_sync_in_worker_thread
return await future
File "C:\Users\simra.conda\envs\f5-tts\lib\site-packages\anyio_backends_asyncio.py", line 943, in run
result = context.run(func, *args)
File "C:\Users\simra.conda\envs\f5-tts\lib\site-packages\gradio\utils.py", line 865, in wrapper
response = f(*args, **kwargs)
File "C:\Users\simra\Downloads\F5-TTS-main\F5-TTS-main\src\f5_tts\infer\infer_gradio.py", line 208, in basic_tts
audio_out, spectrogram_path, ref_text_out = infer(
File "C:\Users\simra\Downloads\F5-TTS-main\F5-TTS-main\src\f5_tts\infer\infer_gradio.py", line 123, in infer
custom_ema_model = load_custom(model[1], vocab_path=model[2])
File "C:\Users\simra\Downloads\F5-TTS-main\F5-TTS-main\src\f5_tts\infer\infer_gradio.py", line 70, in load_custom
return load_model(DiT, model_cfg, ckpt_path, vocab_file=vocab_path)
File "C:\Users\simra\Downloads\F5-TTS-main\F5-TTS-main\src\f5_tts\infer\utils_infer.py", line 251, in load_model
model = load_checkpoint(model, ckpt_path, device, dtype=dtype, use_ema=use_ema)
File "C:\Users\simra\Downloads\F5-TTS-main\F5-TTS-main\src\f5_tts\infer\utils_infer.py", line 200, in load_checkpoint
model.load_state_dict(checkpoint["model_state_dict"])
File "C:\Users\simra.conda\envs\f5-tts\lib\site-packages\torch\nn\modules\module.py", line 2215, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for CFM:
Missing key(s) in state_dict: "transformer.transformer_blocks.18.attn_norm.linear.weight", "transformer.transformer_blocks.18.attn_norm.linear.bias", "transformer.transformer_blocks.18.attn.to_q.weight", "transformer.transformer_blocks.18.attn.to_q.bias", "transformer.transformer_blocks.18.attn.to_k.weight", "transformer.transformer_blocks.18.attn.to_k.bias", "transformer.transformer_blocks.18.attn.to_v.weight", "transformer.transformer_blocks.18.attn.to_v.bias", "transformer.transformer_blocks.18.attn.to_out.0.weight", "transformer.transformer_blocks.18.attn.to_out.0.bias", "transformer.transformer_blocks.18.ff.ff.0.0.weight", "transformer.transformer_blocks.18.ff.ff.0.0.bias", "transformer.transformer_blocks.18.ff.ff.2.weight", "transformer.transformer_blocks.18.ff.ff.2.bias", "transformer.transformer_blocks.19.attn_norm.linear.weight", "transformer.transformer_blocks.19.attn_norm.linear.bias", "transformer.transformer_blocks.19.attn.to_q.weight", "transformer.transformer_blocks.19.attn.to_q.bias", "transformer.transformer_blocks.19.attn.to_k.weight", "transformer.transformer_blocks.19.attn.to_k.bias", "transformer.transformer_blocks.19.attn.to_v.weight", "transformer.transformer_blocks.19.attn.to_v.bias", "transformer.transformer_blocks.19.attn.to_out.0.weight", "transformer.transformer_blocks.19.attn.to_out.0.bias", "transformer.transformer_blocks.19.ff.ff.0.0.weight", "transformer.transformer_blocks.19.ff.ff.0.0.bias", "transformer.transformer_blocks.19.ff.ff.2.weight", "transformer.transformer_blocks.19.ff.ff.2.bias", "transformer.transformer_blocks.20.attn_norm.linear.weight", "transformer.transformer_blocks.20.attn_norm.linear.bias", "transformer.transformer_blocks.20.attn.to_q.weight", "transformer.transformer_blocks.20.attn.to_q.bias", "transformer.transformer_blocks.20.attn.to_k.weight", "transformer.transformer_blocks.20.attn.to_k.bias", "transformer.transformer_blocks.20.attn.to_v.weight", "transformer.transformer_blocks.20.attn.to_v.bias", "transformer.transformer_blocks.20.attn.to_out.0.weight", "transformer.transformer_blocks.20.attn.to_out.0.bias", "transformer.transformer_blocks.20.ff.ff.0.0.weight", "transformer.transformer_blocks.20.ff.ff.0.0.bias", "transformer.transformer_blocks.20.ff.ff.2.weight", "transformer.transformer_blocks.20.ff.ff.2.bias", "transformer.transformer_blocks.21.attn_norm.linear.weight", "transformer.transformer_blocks.21.attn_norm.linear.bias", "transformer.transformer_blocks.21.attn.to_q.weight", "transformer.transformer_blocks.21.attn.to_q.bias", "transformer.transformer_blocks.21.attn.to_k.weight", "transformer.transformer_blocks.21.attn.to_k.bias", "transformer.transformer_blocks.21.attn.to_v.weight", "transformer.transformer_blocks.21.attn.to_v.bias", "transformer.transformer_blocks.21.attn.to_out.0.weight", "transformer.transformer_blocks.21.attn.to_out.0.bias", "transformer.transformer_blocks.21.ff.ff.0.0.weight", "transformer.transformer_blocks.21.ff.ff.0.0.bias", "transformer.transformer_blocks.21.ff.ff.2.weight", "transformer.transformer_blocks.21.ff.ff.2.bias".
size mismatch for transformer.time_embed.time_mlp.0.weight: copying a param with shape torch.Size([768, 256]) from checkpoint, the shape in current model is torch.Size([1024, 256]).
size mismatch for transformer.time_embed.time_mlp.0.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.time_embed.time_mlp.2.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for transformer.time_embed.time_mlp.2.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.input_embed.proj.weight: copying a param with shape torch.Size([768, 712]) from checkpoint, the shape in current model is torch.Size([1024, 712]).
size mismatch for transformer.input_embed.proj.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.input_embed.conv_pos_embed.conv1d.0.weight: copying a param with shape torch.Size([768, 48, 31]) from checkpoint, the shape in current model is torch.Size([1024, 64, 31]).
size mismatch for transformer.input_embed.conv_pos_embed.conv1d.0.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.input_embed.conv_pos_embed.conv1d.2.weight: copying a param with shape torch.Size([768, 48, 31]) from checkpoint, the shape in current model is torch.Size([1024, 64, 31]).
size mismatch for transformer.input_embed.conv_pos_embed.conv1d.2.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.0.attn_norm.linear.weight: copying a param with shape torch.Size([4608, 768]) from checkpoint, the shape in current model is torch.Size([6144, 1024]).
size mismatch for transformer.transformer_blocks.0.attn_norm.linear.bias: copying a param with shape torch.Size([4608]) from checkpoint, the shape in current model is torch.Size([6144]).
size mismatch for transformer.transformer_blocks.0.attn.to_q.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for transformer.transformer_blocks.0.attn.to_q.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.0.attn.to_k.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for transformer.transformer_blocks.0.attn.to_k.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.0.attn.to_v.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for transformer.transformer_blocks.0.attn.to_v.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.0.attn.to_out.0.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for transformer.transformer_blocks.0.attn.to_out.0.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.0.ff.ff.0.0.weight: copying a param with shape torch.Size([1536, 768]) from checkpoint, the shape in current model is torch.Size([2048, 1024]).
size mismatch for transformer.transformer_blocks.0.ff.ff.0.0.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([2048]).
size mismatch for transformer.transformer_blocks.0.ff.ff.2.weight: copying a param with shape torch.Size([768, 1536]) from checkpoint, the shape in current model is torch.Size([1024, 2048]).
size mismatch for transformer.transformer_blocks.0.ff.ff.2.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.1.attn_norm.linear.weight: copying a param with shape torch.Size([4608, 768]) from checkpoint, the shape in current model is torch.Size([6144, 1024]).
size mismatch for transformer.transformer_blocks.1.attn_norm.linear.bias: copying a param with shape torch.Size([4608]) from checkpoint, the shape in current model is torch.Size([6144]).
size mismatch for transformer.transformer_blocks.1.attn.to_q.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for transformer.transformer_blocks.1.attn.to_q.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.1.attn.to_k.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for transformer.transformer_blocks.1.attn.to_k.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.1.attn.to_v.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for transformer.transformer_blocks.1.attn.to_v.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.1.attn.to_out.0.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for transformer.transformer_blocks.1.attn.to_out.0.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.1.ff.ff.0.0.weight: copying a param with shape torch.Size([1536, 768]) from checkpoint, the shape in current model is torch.Size([2048, 1024]).
size mismatch for transformer.transformer_blocks.1.ff.ff.0.0.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([2048]).
size mismatch for transformer.transformer_blocks.1.ff.ff.2.weight: copying a param with shape torch.Size([768, 1536]) from checkpoint, the shape in current model is torch.Size([1024, 2048]).
size mismatch for transformer.transformer_blocks.1.ff.ff.2.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.2.attn_norm.linear.weight: copying a param with shape torch.Size([4608, 768]) from checkpoint, the shape in current model is torch.Size([6144, 1024]).
size mismatch for transformer.transformer_blocks.2.attn_norm.linear.bias: copying a param with shape torch.Size([4608]) from checkpoint, the shape in current model is torch.Size([6144]).
size mismatch for transformer.transformer_blocks.2.attn.to_q.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for transformer.transformer_blocks.2.attn.to_q.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.2.attn.to_k.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for transformer.transformer_blocks.2.attn.to_k.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.2.attn.to_v.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for transformer.transformer_blocks.2.attn.to_v.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.2.attn.to_out.0.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for transformer.transformer_blocks.2.attn.to_out.0.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.2.ff.ff.0.0.weight: copying a param with shape torch.Size([1536, 768]) from checkpoint, the shape in current model is torch.Size([2048, 1024]).
size mismatch for transformer.transformer_blocks.2.ff.ff.0.0.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([2048]).
size mismatch for transformer.transformer_blocks.2.ff.ff.2.weight: copying a param with shape torch.Size([768, 1536]) from checkpoint, the shape in current model is torch.Size([1024, 2048]).
size mismatch for transformer.transformer_blocks.2.ff.ff.2.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.3.attn_norm.linear.weight: copying a param with shape torch.Size([4608, 768]) from checkpoint, the shape in current model is torch.Size([6144, 1024]).
size mismatch for transformer.transformer_blocks.3.attn_norm.linear.bias: copying a param with shape torch.Size([4608]) from checkpoint, the shape in current model is torch.Size([6144]).
size mismatch for transformer.transformer_blocks.3.attn.to_q.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for transformer.transformer_blocks.3.attn.to_q.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.3.attn.to_k.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for transformer.transformer_blocks.3.attn.to_k.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.3.attn.to_v.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for transformer.transformer_blocks.3.attn.to_v.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.3.attn.to_out.0.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for transformer.transformer_blocks.3.attn.to_out.0.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.3.ff.ff.0.0.weight: copying a param with shape torch.Size([1536, 768]) from checkpoint, the shape in current model is torch.Size([2048, 1024]).
size mismatch for transformer.transformer_blocks.3.ff.ff.0.0.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([2048]).
size mismatch for transformer.transformer_blocks.3.ff.ff.2.weight: copying a param with shape torch.Size([768, 1536]) from checkpoint, the shape in current model is torch.Size([1024, 2048]).
size mismatch for transformer.transformer_blocks.3.ff.ff.2.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.4.attn_norm.linear.weight: copying a param with shape torch.Size([4608, 768]) from checkpoint, the shape in current model is torch.Size([6144, 1024]).
size mismatch for transformer.transformer_blocks.4.attn_norm.linear.bias: copying a param with shape torch.Size([4608]) from checkpoint, the shape in current model is torch.Size([6144]).
size mismatch for transformer.transformer_blocks.4.attn.to_q.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for transformer.transformer_blocks.4.attn.to_q.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.4.attn.to_k.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for transformer.transformer_blocks.4.attn.to_k.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.4.attn.to_v.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for transformer.transformer_blocks.4.attn.to_v.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.4.attn.to_out.0.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for transformer.transformer_blocks.4.attn.to_out.0.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.4.ff.ff.0.0.weight: copying a param with shape torch.Size([1536, 768]) from checkpoint, the shape in current model is torch.Size([2048, 1024]).
size mismatch for transformer.transformer_blocks.4.ff.ff.0.0.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([2048]).
size mismatch for transformer.transformer_blocks.4.ff.ff.2.weight: copying a param with shape torch.Size([768, 1536]) from checkpoint, the shape in current model is torch.Size([1024, 2048]).
size mismatch for transformer.transformer_blocks.4.ff.ff.2.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.5.attn_norm.linear.weight: copying a param with shape torch.Size([4608, 768]) from checkpoint, the shape in current model is torch.Size([6144, 1024]).
size mismatch for transformer.transformer_blocks.5.attn_norm.linear.bias: copying a param with shape torch.Size([4608]) from checkpoint, the shape in current model is torch.Size([6144]).
size mismatch for transformer.transformer_blocks.5.attn.to_q.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for transformer.transformer_blocks.5.attn.to_q.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.5.attn.to_k.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for transformer.transformer_blocks.5.attn.to_k.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.5.attn.to_v.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for transformer.transformer_blocks.5.attn.to_v.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.5.attn.to_out.0.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for transformer.transformer_blocks.5.attn.to_out.0.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.5.ff.ff.0.0.weight: copying a param with shape torch.Size([1536, 768]) from checkpoint, the shape in current model is torch.Size([2048, 1024]).
size mismatch for transformer.transformer_blocks.5.ff.ff.0.0.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([2048]).
size mismatch for transformer.transformer_blocks.5.ff.ff.2.weight: copying a param with shape torch.Size([768, 1536]) from checkpoint, the shape in current model is torch.Size([1024, 2048]).
size mismatch for transformer.transformer_blocks.5.ff.ff.2.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.6.attn_norm.linear.weight: copying a param with shape torch.Size([4608, 768]) from checkpoint, the shape in current model is torch.Size([6144, 1024]).
size mismatch for transformer.transformer_blocks.6.attn_norm.linear.bias: copying a param with shape torch.Size([4608]) from checkpoint, the shape in current model is torch.Size([6144]).
size mismatch for transformer.transformer_blocks.6.attn.to_q.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for transformer.transformer_blocks.6.attn.to_q.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.6.attn.to_k.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for transformer.transformer_blocks.6.attn.to_k.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.6.attn.to_v.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for transformer.transformer_blocks.6.attn.to_v.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.6.attn.to_out.0.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for transformer.transformer_blocks.6.attn.to_out.0.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.6.ff.ff.0.0.weight: copying a param with shape torch.Size([1536, 768]) from checkpoint, the shape in current model is torch.Size([2048, 1024]).
size mismatch for transformer.transformer_blocks.6.ff.ff.0.0.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([2048]).
size mismatch for transformer.transformer_blocks.6.ff.ff.2.weight: copying a param with shape torch.Size([768, 1536]) from checkpoint, the shape in current model is torch.Size([1024, 2048]).
size mismatch for transformer.transformer_blocks.6.ff.ff.2.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.7.attn_norm.linear.weight: copying a param with shape torch.Size([4608, 768]) from checkpoint, the shape in current model is torch.Size([6144, 1024]).
size mismatch for transformer.transformer_blocks.7.attn_norm.linear.bias: copying a param with shape torch.Size([4608]) from checkpoint, the shape in current model is torch.Size([6144]).
size mismatch for transformer.transformer_blocks.7.attn.to_q.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for transformer.transformer_blocks.7.attn.to_q.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.7.attn.to_k.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for transformer.transformer_blocks.7.attn.to_k.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.7.attn.to_v.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for transformer.transformer_blocks.7.attn.to_v.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.7.attn.to_out.0.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for transformer.transformer_blocks.7.attn.to_out.0.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.7.ff.ff.0.0.weight: copying a param with shape torch.Size([1536, 768]) from checkpoint, the shape in current model is torch.Size([2048, 1024]).
size mismatch for transformer.transformer_blocks.7.ff.ff.0.0.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([2048]).
size mismatch for transformer.transformer_blocks.7.ff.ff.2.weight: copying a param with shape torch.Size([768, 1536]) from checkpoint, the shape in current model is torch.Size([1024, 2048]).
size mismatch for transformer.transformer_blocks.7.ff.ff.2.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.8.attn_norm.linear.weight: copying a param with shape torch.Size([4608, 768]) from checkpoint, the shape in current model is torch.Size([6144, 1024]).
size mismatch for transformer.transformer_blocks.8.attn_norm.linear.bias: copying a param with shape torch.Size([4608]) from checkpoint, the shape in current model is torch.Size([6144]).
size mismatch for transformer.transformer_blocks.8.attn.to_q.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for transformer.transformer_blocks.8.attn.to_q.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.8.attn.to_k.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for transformer.transformer_blocks.8.attn.to_k.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.8.attn.to_v.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for transformer.transformer_blocks.8.attn.to_v.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.8.attn.to_out.0.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for transformer.transformer_blocks.8.attn.to_out.0.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.8.ff.ff.0.0.weight: copying a param with shape torch.Size([1536, 768]) from checkpoint, the shape in current model is torch.Size([2048, 1024]).
size mismatch for transformer.transformer_blocks.8.ff.ff.0.0.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([2048]).
size mismatch for transformer.transformer_blocks.8.ff.ff.2.weight: copying a param with shape torch.Size([768, 1536]) from checkpoint, the shape in current model is torch.Size([1024, 2048]).
size mismatch for transformer.transformer_blocks.8.ff.ff.2.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.9.attn_norm.linear.weight: copying a param with shape torch.Size([4608, 768]) from checkpoint, the shape in current model is torch.Size([6144, 1024]).
size mismatch for transformer.transformer_blocks.9.attn_norm.linear.bias: copying a param with shape torch.Size([4608]) from checkpoint, the shape in current model is torch.Size([6144]).
size mismatch for transformer.transformer_blocks.9.attn.to_q.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for transformer.transformer_blocks.9.attn.to_q.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.9.attn.to_k.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for transformer.transformer_blocks.9.attn.to_k.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.9.attn.to_v.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for transformer.transformer_blocks.9.attn.to_v.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.9.attn.to_out.0.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for transformer.transformer_blocks.9.attn.to_out.0.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.9.ff.ff.0.0.weight: copying a param with shape torch.Size([1536, 768]) from checkpoint, the shape in current model is torch.Size([2048, 1024]).
size mismatch for transformer.transformer_blocks.9.ff.ff.0.0.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([2048]).
size mismatch for transformer.transformer_blocks.9.ff.ff.2.weight: copying a param with shape torch.Size([768, 1536]) from checkpoint, the shape in current model is torch.Size([1024, 2048]).
size mismatch for transformer.transformer_blocks.9.ff.ff.2.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.10.attn_norm.linear.weight: copying a param with shape torch.Size([4608, 768]) from checkpoint, the shape in current model is torch.Size([6144, 1024]).
size mismatch for transformer.transformer_blocks.10.attn_norm.linear.bias: copying a param with shape torch.Size([4608]) from checkpoint, the shape in current model is torch.Size([6144]).
size mismatch for transformer.transformer_blocks.10.attn.to_q.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for transformer.transformer_blocks.10.attn.to_q.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.10.attn.to_k.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for transformer.transformer_blocks.10.attn.to_k.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.10.attn.to_v.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for transformer.transformer_blocks.10.attn.to_v.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.10.attn.to_out.0.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for transformer.transformer_blocks.10.attn.to_out.0.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.10.ff.ff.0.0.weight: copying a param with shape torch.Size([1536, 768]) from checkpoint, the shape in current model is torch.Size([2048, 1024]).
size mismatch for transformer.transformer_blocks.10.ff.ff.0.0.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([2048]).
size mismatch for transformer.transformer_blocks.10.ff.ff.2.weight: copying a param with shape torch.Size([768, 1536]) from checkpoint, the shape in current model is torch.Size([1024, 2048]).
size mismatch for transformer.transformer_blocks.10.ff.ff.2.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.11.attn_norm.linear.weight: copying a param with shape torch.Size([4608, 768]) from checkpoint, the shape in current model is torch.Size([6144, 1024]).
size mismatch for transformer.transformer_blocks.11.attn_norm.linear.bias: copying a param with shape torch.Size([4608]) from checkpoint, the shape in current model is torch.Size([6144]).
size mismatch for transformer.transformer_blocks.11.attn.to_q.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for transformer.transformer_blocks.11.attn.to_q.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.11.attn.to_k.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for transformer.transformer_blocks.11.attn.to_k.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.11.attn.to_v.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for transformer.transformer_blocks.11.attn.to_v.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.11.attn.to_out.0.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for transformer.transformer_blocks.11.attn.to_out.0.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.11.ff.ff.0.0.weight: copying a param with shape torch.Size([1536, 768]) from checkpoint, the shape in current model is torch.Size([2048, 1024]).
size mismatch for transformer.transformer_blocks.11.ff.ff.0.0.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([2048]).
size mismatch for transformer.transformer_blocks.11.ff.ff.2.weight: copying a param with shape torch.Size([768, 1536]) from checkpoint, the shape in current model is torch.Size([1024, 2048]).
size mismatch for transformer.transformer_blocks.11.ff.ff.2.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.12.attn_norm.linear.weight: copying a param with shape torch.Size([4608, 768]) from checkpoint, the shape in current model is torch.Size([6144, 1024]).
size mismatch for transformer.transformer_blocks.12.attn_norm.linear.bias: copying a param with shape torch.Size([4608]) from checkpoint, the shape in current model is torch.Size([6144]).
size mismatch for transformer.transformer_blocks.12.attn.to_q.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for transformer.transformer_blocks.12.attn.to_q.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.12.attn.to_k.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for transformer.transformer_blocks.12.attn.to_k.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.12.attn.to_v.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for transformer.transformer_blocks.12.attn.to_v.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.12.attn.to_out.0.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for transformer.transformer_blocks.12.attn.to_out.0.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.12.ff.ff.0.0.weight: copying a param with shape torch.Size([1536, 768]) from checkpoint, the shape in current model is torch.Size([2048, 1024]).
size mismatch for transformer.transformer_blocks.12.ff.ff.0.0.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([2048]).
size mismatch for transformer.transformer_blocks.12.ff.ff.2.weight: copying a param with shape torch.Size([768, 1536]) from checkpoint, the shape in current model is torch.Size([1024, 2048]).
size mismatch for transformer.transformer_blocks.12.ff.ff.2.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.13.attn_norm.linear.weight: copying a param with shape torch.Size([4608, 768]) from checkpoint, the shape in current model is torch.Size([6144, 1024]).
size mismatch for transformer.transformer_blocks.13.attn_norm.linear.bias: copying a param with shape torch.Size([4608]) from checkpoint, the shape in current model is torch.Size([6144]).
size mismatch for transformer.transformer_blocks.13.attn.to_q.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for transformer.transformer_blocks.13.attn.to_q.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.13.attn.to_k.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for transformer.transformer_blocks.13.attn.to_k.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.13.attn.to_v.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for transformer.transformer_blocks.13.attn.to_v.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.13.attn.to_out.0.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for transformer.transformer_blocks.13.attn.to_out.0.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.13.ff.ff.0.0.weight: copying a param with shape torch.Size([1536, 768]) from checkpoint, the shape in current model is torch.Size([2048, 1024]).
size mismatch for transformer.transformer_blocks.13.ff.ff.0.0.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([2048]).
size mismatch for transformer.transformer_blocks.13.ff.ff.2.weight: copying a param with shape torch.Size([768, 1536]) from checkpoint, the shape in current model is torch.Size([1024, 2048]).
size mismatch for transformer.transformer_blocks.13.ff.ff.2.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.14.attn_norm.linear.weight: copying a param with shape torch.Size([4608, 768]) from checkpoint, the shape in current model is torch.Size([6144, 1024]).
size mismatch for transformer.transformer_blocks.14.attn_norm.linear.bias: copying a param with shape torch.Size([4608]) from checkpoint, the shape in current model is torch.Size([6144]).
size mismatch for transformer.transformer_blocks.14.attn.to_q.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for transformer.transformer_blocks.14.attn.to_q.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.14.attn.to_k.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for transformer.transformer_blocks.14.attn.to_k.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.14.attn.to_v.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for transformer.transformer_blocks.14.attn.to_v.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.14.attn.to_out.0.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for transformer.transformer_blocks.14.attn.to_out.0.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.14.ff.ff.0.0.weight: copying a param with shape torch.Size([1536, 768]) from checkpoint, the shape in current model is torch.Size([2048, 1024]).
size mismatch for transformer.transformer_blocks.14.ff.ff.0.0.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([2048]).
size mismatch for transformer.transformer_blocks.14.ff.ff.2.weight: copying a param with shape torch.Size([768, 1536]) from checkpoint, the shape in current model is torch.Size([1024, 2048]).
size mismatch for transformer.transformer_blocks.14.ff.ff.2.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.15.attn_norm.linear.weight: copying a param with shape torch.Size([4608, 768]) from checkpoint, the shape in current model is torch.Size([6144, 1024]).
size mismatch for transformer.transformer_blocks.15.attn_norm.linear.bias: copying a param with shape torch.Size([4608]) from checkpoint, the shape in current model is torch.Size([6144]).
size mismatch for transformer.transformer_blocks.15.attn.to_q.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for transformer.transformer_blocks.15.attn.to_q.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.15.attn.to_k.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for transformer.transformer_blocks.15.attn.to_k.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.15.attn.to_v.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for transformer.transformer_blocks.15.attn.to_v.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.15.attn.to_out.0.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for transformer.transformer_blocks.15.attn.to_out.0.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.15.ff.ff.0.0.weight: copying a param with shape torch.Size([1536, 768]) from checkpoint, the shape in current model is torch.Size([2048, 1024]).
size mismatch for transformer.transformer_blocks.15.ff.ff.0.0.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([2048]).
size mismatch for transformer.transformer_blocks.15.ff.ff.2.weight: copying a param with shape torch.Size([768, 1536]) from checkpoint, the shape in current model is torch.Size([1024, 2048]).
size mismatch for transformer.transformer_blocks.15.ff.ff.2.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.16.attn_norm.linear.weight: copying a param with shape torch.Size([4608, 768]) from checkpoint, the shape in current model is torch.Size([6144, 1024]).
size mismatch for transformer.transformer_blocks.16.attn_norm.linear.bias: copying a param with shape torch.Size([4608]) from checkpoint, the shape in current model is torch.Size([6144]).
size mismatch for transformer.transformer_blocks.16.attn.to_q.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for transformer.transformer_blocks.16.attn.to_q.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.16.attn.to_k.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for transformer.transformer_blocks.16.attn.to_k.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.16.attn.to_v.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for transformer.transformer_blocks.16.attn.to_v.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.16.attn.to_out.0.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for transformer.transformer_blocks.16.attn.to_out.0.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.16.ff.ff.0.0.weight: copying a param with shape torch.Size([1536, 768]) from checkpoint, the shape in current model is torch.Size([2048, 1024]).
size mismatch for transformer.transformer_blocks.16.ff.ff.0.0.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([2048]).
size mismatch for transformer.transformer_blocks.16.ff.ff.2.weight: copying a param with shape torch.Size([768, 1536]) from checkpoint, the shape in current model is torch.Size([1024, 2048]).
size mismatch for transformer.transformer_blocks.16.ff.ff.2.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.17.attn_norm.linear.weight: copying a param with shape torch.Size([4608, 768]) from checkpoint, the shape in current model is torch.Size([6144, 1024]).
size mismatch for transformer.transformer_blocks.17.attn_norm.linear.bias: copying a param with shape torch.Size([4608]) from checkpoint, the shape in current model is torch.Size([6144]).
size mismatch for transformer.transformer_blocks.17.attn.to_q.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for transformer.transformer_blocks.17.attn.to_q.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.17.attn.to_k.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for transformer.transformer_blocks.17.attn.to_k.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.17.attn.to_v.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for transformer.transformer_blocks.17.attn.to_v.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.17.attn.to_out.0.weight: copying a param with shape torch.Size([768, 768]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
size mismatch for transformer.transformer_blocks.17.attn.to_out.0.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.transformer_blocks.17.ff.ff.0.0.weight: copying a param with shape torch.Size([1536, 768]) from checkpoint, the shape in current model is torch.Size([2048, 1024]).
size mismatch for transformer.transformer_blocks.17.ff.ff.0.0.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([2048]).
size mismatch for transformer.transformer_blocks.17.ff.ff.2.weight: copying a param with shape torch.Size([768, 1536]) from checkpoint, the shape in current model is torch.Size([1024, 2048]).
size mismatch for transformer.transformer_blocks.17.ff.ff.2.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]).
size mismatch for transformer.norm_out.linear.weight: copying a param with shape torch.Size([1536, 768]) from checkpoint, the shape in current model is torch.Size([2048, 1024]).
size mismatch for transformer.norm_out.linear.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([2048]).
size mismatch for transformer.proj_out.weight: copying a param with shape torch.Size([100, 768]) from checkpoint, the shape in current model is torch.Size([100, 1024]).