|
[INFO|tokenization_utils_base.py:2024] 2024-01-04 10:15:14,792 >> loading file vocab.json |
|
[INFO|tokenization_utils_base.py:2024] 2024-01-04 10:15:14,792 >> loading file merges.txt |
|
[INFO|tokenization_utils_base.py:2024] 2024-01-04 10:15:14,792 >> loading file added_tokens.json |
|
[INFO|tokenization_utils_base.py:2024] 2024-01-04 10:15:14,792 >> loading file special_tokens_map.json |
|
[INFO|tokenization_utils_base.py:2024] 2024-01-04 10:15:14,792 >> loading file tokenizer_config.json |
|
[INFO|tokenization_utils_base.py:2024] 2024-01-04 10:15:14,792 >> loading file tokenizer.json |
|
[WARNING|logging.py:314] 2024-01-04 10:15:14,844 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. |
|
[INFO|configuration_utils.py:737] 2024-01-04 10:15:14,845 >> loading configuration file ./models/dolphin-2_6-phi-2/config.json |
|
[INFO|configuration_utils.py:737] 2024-01-04 10:15:14,848 >> loading configuration file ./models/dolphin-2_6-phi-2/config.json |
|
[INFO|configuration_utils.py:802] 2024-01-04 10:15:14,849 >> Model config PhiConfig { |
|
"_name_or_path": "./models/dolphin-2_6-phi-2", |
|
"activation_function": "gelu_new", |
|
"architectures": [ |
|
"PhiForCausalLM" |
|
], |
|
"attn_pdrop": 0.0, |
|
"auto_map": { |
|
"AutoConfig": "configuration_phi.PhiConfig", |
|
"AutoModelForCausalLM": "modeling_phi.PhiForCausalLM" |
|
}, |
|
"embd_pdrop": 0.0, |
|
"flash_attn": false, |
|
"flash_rotary": false, |
|
"fused_dense": false, |
|
"img_processor": null, |
|
"initializer_range": 0.02, |
|
"layer_norm_epsilon": 1e-05, |
|
"model_type": "phi-msft", |
|
"n_embd": 2560, |
|
"n_head": 32, |
|
"n_head_kv": null, |
|
"n_inner": null, |
|
"n_layer": 32, |
|
"n_positions": 2048, |
|
"resid_pdrop": 0.1, |
|
"rotary_dim": 32, |
|
"tie_word_embeddings": false, |
|
"torch_dtype": "float16", |
|
"transformers_version": "4.36.2", |
|
"use_cache": false, |
|
"vocab_size": 51200 |
|
} |
|
|
|
[INFO|modeling_utils.py:3341] 2024-01-04 10:15:14,907 >> loading weights file ./models/dolphin-2_6-phi-2/model.safetensors.index.json |
|
[INFO|configuration_utils.py:826] 2024-01-04 10:15:14,908 >> Generate config GenerationConfig { |
|
"use_cache": false |
|
} |
|
|
|
[INFO|configuration_utils.py:826] 2024-01-04 10:15:14,908 >> Generate config GenerationConfig { |
|
"use_cache": false |
|
} |
|
|
|
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]
Loading checkpoint shards: 50%|βββββ | 1/2 [00:00<00:00, 1.40it/s]
Loading checkpoint shards: 100%|ββββββββββ| 2/2 [00:00<00:00, 2.33it/s]
Loading checkpoint shards: 100%|ββββββββββ| 2/2 [00:00<00:00, 2.11it/s] |
|
[WARNING|modeling_utils.py:4175] 2024-01-04 10:15:16,121 >> Some weights of the model checkpoint at ./models/dolphin-2_6-phi-2 were not used when initializing PhiForCausalLM: ['lm_head.linear.lora_B.default.weight', 'lm_head.linear.lora_A.default.weight'] |
|
- This IS expected if you are initializing PhiForCausalLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model). |
|
- This IS NOT expected if you are initializing PhiForCausalLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). |
|
[INFO|modeling_utils.py:4193] 2024-01-04 10:15:16,122 >> All the weights of PhiForCausalLM were initialized from the model checkpoint at ./models/dolphin-2_6-phi-2. |
|
If your task is similar to the task the model of the checkpoint was trained on, you can already use PhiForCausalLM for predictions without further training. |
|
[INFO|configuration_utils.py:779] 2024-01-04 10:15:16,125 >> loading configuration file ./models/dolphin-2_6-phi-2/generation_config.json |
|
[INFO|configuration_utils.py:826] 2024-01-04 10:15:16,125 >> Generate config GenerationConfig {} |
|
|
|
01/04/2024 10:15:16 - INFO - llmtuner.model.adapter - Fine-tuning method: LoRA |
|
01/04/2024 10:15:17 - INFO - llmtuner.model.adapter - Merged 1 adapter(s). |
|
01/04/2024 10:15:17 - INFO - llmtuner.model.adapter - Loaded adapter(s): ./models/sft/dolphin-2_6-phi-2-sft-glaive-function-calling-v2-ep1-lora |
|
01/04/2024 10:15:17 - INFO - llmtuner.model.loader - trainable params: 0 || all params: 2779683840 || trainable%: 0.0000 |
|
01/04/2024 10:15:17 - INFO - llmtuner.model.loader - This IS expected that the trainable params is 0 if you are using model for inference only. |
|
[INFO|configuration_utils.py:483] 2024-01-04 10:15:17,798 >> Configuration saved in ./models/export/dolphin-2_6-phi-2-sft-glaive-function-calling-v2-ep1/config.json |
|
[INFO|configuration_utils.py:594] 2024-01-04 10:15:17,799 >> Configuration saved in ./models/export/dolphin-2_6-phi-2-sft-glaive-function-calling-v2-ep1/generation_config.json |
|
[INFO|modeling_utils.py:2390] 2024-01-04 10:15:23,256 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 2 checkpoint shards. You can find where each parameters has been saved in the index located at ./models/export/dolphin-2_6-phi-2-sft-glaive-function-calling-v2-ep1/model.safetensors.index.json. |
|
[INFO|tokenization_utils_base.py:2432] 2024-01-04 10:15:23,258 >> tokenizer config file saved in ./models/export/dolphin-2_6-phi-2-sft-glaive-function-calling-v2-ep1/tokenizer_config.json |
|
[INFO|tokenization_utils_base.py:2441] 2024-01-04 10:15:23,258 >> Special tokens file saved in ./models/export/dolphin-2_6-phi-2-sft-glaive-function-calling-v2-ep1/special_tokens_map.json |
|
[INFO|tokenization_utils_base.py:2492] 2024-01-04 10:15:23,258 >> added tokens file saved in ./models/export/dolphin-2_6-phi-2-sft-glaive-function-calling-v2-ep1/added_tokens.json |
|
|