RuntimeError: Error(s) in loading state_dict for MixFormerSequentialForCausalLm
#14
by
nmd2k
- opened
Hi, I recently fine-tune phi-1.5. However, I wasn't able to load its checkpoint. It seem like the config of MixFormerSequentialForCausalLm has been modified.
Details log:
WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
[2023-09-13 13:10:40,735] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-09-13 13:10:41,210] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
loading configuration file /datadrive05/dungnm31/Exp/phi15/checkpoint-3450/config.json
loading configuration file /datadrive05/dungnm31/Exp/phi15/checkpoint-3450/config.json
Model config MixFormerSequentialConfig {
"_name_or_path": "/datadrive05/dungnm31/Exp/phi15/checkpoint-3450/",
"activation_function": "gelu_new",
"architecture": {
"block_cls": "parallel",
"mixer": {},
"mlp": {
"mlp_cls": "mlp"
}
},
"architectures": [
"MixFormerSequentialForCausalLM"
],
"auto_map": {
"AutoConfig": "microsoft/phi-1_5--configuration_mixformer_sequential.MixFormerSequentialConfig",
"AutoModelForCausalLM": "microsoft/phi-1_5--modeling_mixformer_sequential.MixFormerSequentialForCausalLM"
},
"embd_layer": "default",
"embd_pdrop": 0.0,
"initializer_range": 0.02,
"layer_norm_epsilon": 1e-05,
"model_type": "mixformer-sequential",
"n_embd": 2048,
"n_head": 32,
"n_inner": null,
"n_layer": 24,
"n_positions": 2048,
"phyagi_version": "0.0.4.dev",
"resid_pdrop": 0.0,
"rotary_dim": 32,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.34.0.dev0",
"vocab_size": 50304
}
loading file vocab.json
loading file merges.txt
loading file tokenizer.json
loading file added_tokens.json
loading file special_tokens_map.json
loading file tokenizer_config.json
loading weights file /datadrive05/dungnm31/Exp/phi15/checkpoint-3450/pytorch_model.bin
Generate config GenerationConfig {
"_from_model_config": true,
"transformers_version": "4.34.0.dev0"
}
Traceback (most recent call last):
File "/datadrive05/dungnm31/inst/main.py", line 150, in <module>
main()
File "/datadrive05/dungnm31/inst/main.py", line 91, in main
model = AutoModelForCausalLM.from_pretrained(model_args.model_name_or_path,
File "/home/dungnm31/.local/lib/python3.9/site-packages/transformers/models/auto/auto_factory.py", line 558, in from_pretrained
return model_class.from_pretrained(
File "/home/dungnm31/.local/lib/python3.9/site-packages/transformers/modeling_utils.py", line 3180, in from_pretrained
) = cls._load_pretrained_model(
File "/home/dungnm31/.local/lib/python3.9/site-packages/transformers/modeling_utils.py", line 3629, in _load_pretrained_model
raise RuntimeError(f"Error(s) in loading state_dict for {model.__class__.__name__}:\n\t{error_msg}")
RuntimeError: Error(s) in loading state_dict for MixFormerSequentialForCausalLM:
size mismatch for layers.0.wte.weight: copying a param with shape torch.Size([50296, 2048]) from checkpoint, the shape in current model is torch.Size([50304, 2048]).
size mismatch for layers.25.linear.weight: copying a param with shape torch.Size([50296, 2048]) from checkpoint, the shape in current model is torch.Size([50304, 2048]).
size mismatch for layers.25.linear.bias: copying a param with shape torch.Size([50296]) from checkpoint, the shape in current model is torch.Size([50304]).
You may consider adding `ignore_mismatched_sizes=True` in the model `from_pretrained` method.
nmd2k
changed discussion status to
closed