Incorrect vocab size?
First of all, the model is really neat, thank you for sharing!
Second, I tried to convert the model to the gguf
format using the script from llama.cpp https://github.com/ggerganov/llama.cpp/blob/master/convert.py and got an error about vocab size:
Exception: Vocab size mismatch (model has 32256, but Magicoder-S-DS-6.7B/tokenizer.model combined with Magicoder-S-DS-6.7B/added_tokens.json has 32022)
Is it possible that you updated tokenizer.model and added_tokens.json, but forgot to update config.json? It seems that if I set the vocab_size value in config.json to 32022 from the original value of 32256, the conversion takes place, but I'm not sure if this breaks anything.
Any answer or tip would be highly appreciated.
Thanks for your interest in Magicoder! Magicoder-S-DS-6.7B is based on deepseek-coder-6.7b-base so the tokenizer configs and the model config should be identical. I did some quick search and found similar issues: https://huggingface.co/TheBloke/deepseek-coder-33B-instruct-GGUF/discussions/2#654a04eb8fde27109bda19c1. Let me quote the response here:
This is not an error, just an info message which can be ignored. The same message is printed by llama.cpp and it has no impact that I've noticed
So I guess you can safely ignore the warning.
Thank you very much for the helpful reply @yuxiang630 !
@yuxiang630 , may I ask when the vocab size is inconsistent, 32256 vs. 32022?
I tried ignoring the vocab warning but I still can't get this to convert to GGUF. Their CL version works fine, it's just this DeepSeek one.
I tried ignoring the vocab warning but I still can't get this to convert to GGUF. Their CL version works fine, it's just this DeepSeek one.
Hi @lawls , did you try if the base deepseek-coder 6.7B can be converted successfully, or is it a problem specific to Magicoder-DS?
@yuxiang630 theirs doesn't work either. when I convert using llama.cpp, I get notified about the wrong vocab size
python convert.py /Users/lawls/Development/models/Magicoder-S-DS-6.7B/
Loading model file /Users/lawls/Development/models/Magicoder-S-DS-6.7B/model-00001-of-00006.safetensors
Loading model file /Users/lawls/Development/models/Magicoder-S-DS-6.7B/model-00001-of-00006.safetensors
Loading model file /Users/lawls/Development/models/Magicoder-S-DS-6.7B/model-00002-of-00006.safetensors
Loading model file /Users/lawls/Development/models/Magicoder-S-DS-6.7B/model-00003-of-00006.safetensors
Loading model file /Users/lawls/Development/models/Magicoder-S-DS-6.7B/model-00004-of-00006.safetensors
Loading model file /Users/lawls/Development/models/Magicoder-S-DS-6.7B/model-00005-of-00006.safetensors
Loading model file /Users/lawls/Development/models/Magicoder-S-DS-6.7B/model-00006-of-00006.safetensors
params = Params(n_vocab=32256, n_embd=4096, n_layer=32, n_ctx=16384, n_ff=11008, n_head=32, n_head_kv=32, n_experts=None, n_experts_used=None, f_norm_eps=1e-06, rope_scaling_type=<RopeScalingType.LINEAR: 'linear'>, f_rope_freq_base=100000, f_rope_scale=4.0, n_orig_ctx=None, rope_finetuned=None, ftype=None, path_model=PosixPath('/Users/lawls/Development/models/Magicoder-S-DS-6.7B'))
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
32016 32000
Vocab info: <VocabLoader with 32000 base tokens and 22 added tokens>
Special vocab info: <SpecialVocab with 31757 merges, special tokens {'bos': 32013, 'eos': 32014, 'pad': 32014}, add special tokens {'bos': True, 'eos': False}>
Permuting layer 0
Permuting layer 1
Permuting layer 2
Permuting layer 3
Permuting layer 4
Permuting layer 5
Permuting layer 6
Permuting layer 7
Permuting layer 8
Permuting layer 9
Permuting layer 10
Permuting layer 11
Permuting layer 12
Permuting layer 13
Permuting layer 14
Permuting layer 15
Permuting layer 16
Permuting layer 17
Permuting layer 18
Permuting layer 19
Permuting layer 20
Permuting layer 21
Permuting layer 22
Permuting layer 23
Permuting layer 24
Permuting layer 25
Permuting layer 26
Permuting layer 27
Permuting layer 28
Permuting layer 29
Permuting layer 30
Permuting layer 31
model.embed_tokens.weight -> token_embd.weight | F32 | [32256, 4096]
model.layers.0.input_layernorm.weight -> blk.0.attn_norm.weight | F32 | [4096]
model.layers.0.mlp.down_proj.weight -> blk.0.ffn_down.weight | F32 | [4096, 11008]
model.layers.0.mlp.gate_proj.weight -> blk.0.ffn_gate.weight | F32 | [11008, 4096]
model.layers.0.mlp.up_proj.weight -> blk.0.ffn_up.weight | F32 | [11008, 4096]
model.layers.0.post_attention_layernorm.weight -> blk.0.ffn_norm.weight | F32 | [4096]
model.layers.0.self_attn.k_proj.weight -> blk.0.attn_k.weight | F32 | [4096, 4096]
model.layers.0.self_attn.o_proj.weight -> blk.0.attn_output.weight | F32 | [4096, 4096]
model.layers.0.self_attn.q_proj.weight -> blk.0.attn_q.weight | F32 | [4096, 4096]
model.layers.0.self_attn.v_proj.weight -> blk.0.attn_v.weight | F32 | [4096, 4096]
model.layers.1.input_layernorm.weight -> blk.1.attn_norm.weight | F32 | [4096]
model.layers.1.mlp.down_proj.weight -> blk.1.ffn_down.weight | F32 | [4096, 11008]
model.layers.1.mlp.gate_proj.weight -> blk.1.ffn_gate.weight | F32 | [11008, 4096]
model.layers.1.mlp.up_proj.weight -> blk.1.ffn_up.weight | F32 | [11008, 4096]
model.layers.1.post_attention_layernorm.weight -> blk.1.ffn_norm.weight | F32 | [4096]
model.layers.1.self_attn.k_proj.weight -> blk.1.attn_k.weight | F32 | [4096, 4096]
model.layers.1.self_attn.o_proj.weight -> blk.1.attn_output.weight | F32 | [4096, 4096]
model.layers.1.self_attn.q_proj.weight -> blk.1.attn_q.weight | F32 | [4096, 4096]
model.layers.1.self_attn.v_proj.weight -> blk.1.attn_v.weight | F32 | [4096, 4096]
model.layers.2.input_layernorm.weight -> blk.2.attn_norm.weight | F32 | [4096]
model.layers.2.mlp.down_proj.weight -> blk.2.ffn_down.weight | F32 | [4096, 11008]
model.layers.2.mlp.gate_proj.weight -> blk.2.ffn_gate.weight | F32 | [11008, 4096]
model.layers.2.mlp.up_proj.weight -> blk.2.ffn_up.weight | F32 | [11008, 4096]
model.layers.2.post_attention_layernorm.weight -> blk.2.ffn_norm.weight | F32 | [4096]
model.layers.2.self_attn.k_proj.weight -> blk.2.attn_k.weight | F32 | [4096, 4096]
model.layers.2.self_attn.o_proj.weight -> blk.2.attn_output.weight | F32 | [4096, 4096]
model.layers.2.self_attn.q_proj.weight -> blk.2.attn_q.weight | F32 | [4096, 4096]
model.layers.2.self_attn.v_proj.weight -> blk.2.attn_v.weight | F32 | [4096, 4096]
model.layers.3.input_layernorm.weight -> blk.3.attn_norm.weight | F32 | [4096]
model.layers.3.mlp.down_proj.weight -> blk.3.ffn_down.weight | F32 | [4096, 11008]
model.layers.3.mlp.gate_proj.weight -> blk.3.ffn_gate.weight | F32 | [11008, 4096]
model.layers.3.mlp.up_proj.weight -> blk.3.ffn_up.weight | F32 | [11008, 4096]
model.layers.3.post_attention_layernorm.weight -> blk.3.ffn_norm.weight | F32 | [4096]
model.layers.3.self_attn.k_proj.weight -> blk.3.attn_k.weight | F32 | [4096, 4096]
model.layers.3.self_attn.o_proj.weight -> blk.3.attn_output.weight | F32 | [4096, 4096]
model.layers.3.self_attn.q_proj.weight -> blk.3.attn_q.weight | F32 | [4096, 4096]
model.layers.3.self_attn.v_proj.weight -> blk.3.attn_v.weight | F32 | [4096, 4096]
model.layers.4.input_layernorm.weight -> blk.4.attn_norm.weight | F32 | [4096]
model.layers.4.mlp.down_proj.weight -> blk.4.ffn_down.weight | F32 | [4096, 11008]
model.layers.4.mlp.gate_proj.weight -> blk.4.ffn_gate.weight | F32 | [11008, 4096]
model.layers.4.mlp.up_proj.weight -> blk.4.ffn_up.weight | F32 | [11008, 4096]
model.layers.4.post_attention_layernorm.weight -> blk.4.ffn_norm.weight | F32 | [4096]
model.layers.4.self_attn.k_proj.weight -> blk.4.attn_k.weight | F32 | [4096, 4096]
model.layers.4.self_attn.o_proj.weight -> blk.4.attn_output.weight | F32 | [4096, 4096]
model.layers.4.self_attn.q_proj.weight -> blk.4.attn_q.weight | F32 | [4096, 4096]
model.layers.4.self_attn.v_proj.weight -> blk.4.attn_v.weight | F32 | [4096, 4096]
model.layers.5.self_attn.k_proj.weight -> blk.5.attn_k.weight | F32 | [4096, 4096]
model.layers.5.self_attn.o_proj.weight -> blk.5.attn_output.weight | F32 | [4096, 4096]
model.layers.5.self_attn.q_proj.weight -> blk.5.attn_q.weight | F32 | [4096, 4096]
model.layers.5.self_attn.v_proj.weight -> blk.5.attn_v.weight | F32 | [4096, 4096]
model.layers.10.input_layernorm.weight -> blk.10.attn_norm.weight | F32 | [4096]
model.layers.10.mlp.down_proj.weight -> blk.10.ffn_down.weight | F32 | [4096, 11008]
model.layers.10.mlp.gate_proj.weight -> blk.10.ffn_gate.weight | F32 | [11008, 4096]
model.layers.10.mlp.up_proj.weight -> blk.10.ffn_up.weight | F32 | [11008, 4096]
model.layers.10.post_attention_layernorm.weight -> blk.10.ffn_norm.weight | F32 | [4096]
model.layers.10.self_attn.k_proj.weight -> blk.10.attn_k.weight | F32 | [4096, 4096]
model.layers.10.self_attn.o_proj.weight -> blk.10.attn_output.weight | F32 | [4096, 4096]
model.layers.10.self_attn.q_proj.weight -> blk.10.attn_q.weight | F32 | [4096, 4096]
model.layers.10.self_attn.v_proj.weight -> blk.10.attn_v.weight | F32 | [4096, 4096]
model.layers.11.self_attn.k_proj.weight -> blk.11.attn_k.weight | F32 | [4096, 4096]
model.layers.11.self_attn.o_proj.weight -> blk.11.attn_output.weight | F32 | [4096, 4096]
model.layers.11.self_attn.q_proj.weight -> blk.11.attn_q.weight | F32 | [4096, 4096]
model.layers.11.self_attn.v_proj.weight -> blk.11.attn_v.weight | F32 | [4096, 4096]
model.layers.5.input_layernorm.weight -> blk.5.attn_norm.weight | F32 | [4096]
model.layers.5.mlp.down_proj.weight -> blk.5.ffn_down.weight | F32 | [4096, 11008]
model.layers.5.mlp.gate_proj.weight -> blk.5.ffn_gate.weight | F32 | [11008, 4096]
model.layers.5.mlp.up_proj.weight -> blk.5.ffn_up.weight | F32 | [11008, 4096]
model.layers.5.post_attention_layernorm.weight -> blk.5.ffn_norm.weight | F32 | [4096]
model.layers.6.input_layernorm.weight -> blk.6.attn_norm.weight | F32 | [4096]
model.layers.6.mlp.down_proj.weight -> blk.6.ffn_down.weight | F32 | [4096, 11008]
model.layers.6.mlp.gate_proj.weight -> blk.6.ffn_gate.weight | F32 | [11008, 4096]
model.layers.6.mlp.up_proj.weight -> blk.6.ffn_up.weight | F32 | [11008, 4096]
model.layers.6.post_attention_layernorm.weight -> blk.6.ffn_norm.weight | F32 | [4096]
model.layers.6.self_attn.k_proj.weight -> blk.6.attn_k.weight | F32 | [4096, 4096]
model.layers.6.self_attn.o_proj.weight -> blk.6.attn_output.weight | F32 | [4096, 4096]
model.layers.6.self_attn.q_proj.weight -> blk.6.attn_q.weight | F32 | [4096, 4096]
model.layers.6.self_attn.v_proj.weight -> blk.6.attn_v.weight | F32 | [4096, 4096]
model.layers.7.input_layernorm.weight -> blk.7.attn_norm.weight | F32 | [4096]
model.layers.7.mlp.down_proj.weight -> blk.7.ffn_down.weight | F32 | [4096, 11008]
model.layers.7.mlp.gate_proj.weight -> blk.7.ffn_gate.weight | F32 | [11008, 4096]
model.layers.7.mlp.up_proj.weight -> blk.7.ffn_up.weight | F32 | [11008, 4096]
model.layers.7.post_attention_layernorm.weight -> blk.7.ffn_norm.weight | F32 | [4096]
model.layers.7.self_attn.k_proj.weight -> blk.7.attn_k.weight | F32 | [4096, 4096]
model.layers.7.self_attn.o_proj.weight -> blk.7.attn_output.weight | F32 | [4096, 4096]
model.layers.7.self_attn.q_proj.weight -> blk.7.attn_q.weight | F32 | [4096, 4096]
model.layers.7.self_attn.v_proj.weight -> blk.7.attn_v.weight | F32 | [4096, 4096]
model.layers.8.input_layernorm.weight -> blk.8.attn_norm.weight | F32 | [4096]
model.layers.8.mlp.down_proj.weight -> blk.8.ffn_down.weight | F32 | [4096, 11008]
model.layers.8.mlp.gate_proj.weight -> blk.8.ffn_gate.weight | F32 | [11008, 4096]
model.layers.8.mlp.up_proj.weight -> blk.8.ffn_up.weight | F32 | [11008, 4096]
model.layers.8.post_attention_layernorm.weight -> blk.8.ffn_norm.weight | F32 | [4096]
model.layers.8.self_attn.k_proj.weight -> blk.8.attn_k.weight | F32 | [4096, 4096]
model.layers.8.self_attn.o_proj.weight -> blk.8.attn_output.weight | F32 | [4096, 4096]
model.layers.8.self_attn.q_proj.weight -> blk.8.attn_q.weight | F32 | [4096, 4096]
model.layers.8.self_attn.v_proj.weight -> blk.8.attn_v.weight | F32 | [4096, 4096]
model.layers.9.input_layernorm.weight -> blk.9.attn_norm.weight | F32 | [4096]
model.layers.9.mlp.down_proj.weight -> blk.9.ffn_down.weight | F32 | [4096, 11008]
model.layers.9.mlp.gate_proj.weight -> blk.9.ffn_gate.weight | F32 | [11008, 4096]
model.layers.9.mlp.up_proj.weight -> blk.9.ffn_up.weight | F32 | [11008, 4096]
model.layers.9.post_attention_layernorm.weight -> blk.9.ffn_norm.weight | F32 | [4096]
model.layers.9.self_attn.k_proj.weight -> blk.9.attn_k.weight | F32 | [4096, 4096]
model.layers.9.self_attn.o_proj.weight -> blk.9.attn_output.weight | F32 | [4096, 4096]
model.layers.9.self_attn.q_proj.weight -> blk.9.attn_q.weight | F32 | [4096, 4096]
model.layers.9.self_attn.v_proj.weight -> blk.9.attn_v.weight | F32 | [4096, 4096]
model.layers.11.input_layernorm.weight -> blk.11.attn_norm.weight | F32 | [4096]
model.layers.11.mlp.down_proj.weight -> blk.11.ffn_down.weight | F32 | [4096, 11008]
model.layers.11.mlp.gate_proj.weight -> blk.11.ffn_gate.weight | F32 | [11008, 4096]
model.layers.11.mlp.up_proj.weight -> blk.11.ffn_up.weight | F32 | [11008, 4096]
model.layers.11.post_attention_layernorm.weight -> blk.11.ffn_norm.weight | F32 | [4096]
model.layers.12.input_layernorm.weight -> blk.12.attn_norm.weight | F32 | [4096]
model.layers.12.mlp.down_proj.weight -> blk.12.ffn_down.weight | F32 | [4096, 11008]
model.layers.12.mlp.gate_proj.weight -> blk.12.ffn_gate.weight | F32 | [11008, 4096]
model.layers.12.mlp.up_proj.weight -> blk.12.ffn_up.weight | F32 | [11008, 4096]
model.layers.12.post_attention_layernorm.weight -> blk.12.ffn_norm.weight | F32 | [4096]
model.layers.12.self_attn.k_proj.weight -> blk.12.attn_k.weight | F32 | [4096, 4096]
model.layers.12.self_attn.o_proj.weight -> blk.12.attn_output.weight | F32 | [4096, 4096]
model.layers.12.self_attn.q_proj.weight -> blk.12.attn_q.weight | F32 | [4096, 4096]
model.layers.12.self_attn.v_proj.weight -> blk.12.attn_v.weight | F32 | [4096, 4096]
model.layers.13.input_layernorm.weight -> blk.13.attn_norm.weight | F32 | [4096]
model.layers.13.mlp.down_proj.weight -> blk.13.ffn_down.weight | F32 | [4096, 11008]
model.layers.13.mlp.gate_proj.weight -> blk.13.ffn_gate.weight | F32 | [11008, 4096]
model.layers.13.mlp.up_proj.weight -> blk.13.ffn_up.weight | F32 | [11008, 4096]
model.layers.13.post_attention_layernorm.weight -> blk.13.ffn_norm.weight | F32 | [4096]
model.layers.13.self_attn.k_proj.weight -> blk.13.attn_k.weight | F32 | [4096, 4096]
model.layers.13.self_attn.o_proj.weight -> blk.13.attn_output.weight | F32 | [4096, 4096]
model.layers.13.self_attn.q_proj.weight -> blk.13.attn_q.weight | F32 | [4096, 4096]
model.layers.13.self_attn.v_proj.weight -> blk.13.attn_v.weight | F32 | [4096, 4096]
model.layers.14.input_layernorm.weight -> blk.14.attn_norm.weight | F32 | [4096]
model.layers.14.mlp.down_proj.weight -> blk.14.ffn_down.weight | F32 | [4096, 11008]
model.layers.14.mlp.gate_proj.weight -> blk.14.ffn_gate.weight | F32 | [11008, 4096]
model.layers.14.mlp.up_proj.weight -> blk.14.ffn_up.weight | F32 | [11008, 4096]
model.layers.14.post_attention_layernorm.weight -> blk.14.ffn_norm.weight | F32 | [4096]
model.layers.14.self_attn.k_proj.weight -> blk.14.attn_k.weight | F32 | [4096, 4096]
model.layers.14.self_attn.o_proj.weight -> blk.14.attn_output.weight | F32 | [4096, 4096]
model.layers.14.self_attn.q_proj.weight -> blk.14.attn_q.weight | F32 | [4096, 4096]
model.layers.14.self_attn.v_proj.weight -> blk.14.attn_v.weight | F32 | [4096, 4096]
model.layers.15.input_layernorm.weight -> blk.15.attn_norm.weight | F32 | [4096]
model.layers.15.mlp.down_proj.weight -> blk.15.ffn_down.weight | F32 | [4096, 11008]
model.layers.15.mlp.gate_proj.weight -> blk.15.ffn_gate.weight | F32 | [11008, 4096]
model.layers.15.mlp.up_proj.weight -> blk.15.ffn_up.weight | F32 | [11008, 4096]
model.layers.15.post_attention_layernorm.weight -> blk.15.ffn_norm.weight | F32 | [4096]
model.layers.15.self_attn.k_proj.weight -> blk.15.attn_k.weight | F32 | [4096, 4096]
model.layers.15.self_attn.o_proj.weight -> blk.15.attn_output.weight | F32 | [4096, 4096]
model.layers.15.self_attn.q_proj.weight -> blk.15.attn_q.weight | F32 | [4096, 4096]
model.layers.15.self_attn.v_proj.weight -> blk.15.attn_v.weight | F32 | [4096, 4096]
model.layers.16.input_layernorm.weight -> blk.16.attn_norm.weight | F32 | [4096]
model.layers.16.mlp.down_proj.weight -> blk.16.ffn_down.weight | F32 | [4096, 11008]
model.layers.16.mlp.gate_proj.weight -> blk.16.ffn_gate.weight | F32 | [11008, 4096]
model.layers.16.mlp.up_proj.weight -> blk.16.ffn_up.weight | F32 | [11008, 4096]
model.layers.16.post_attention_layernorm.weight -> blk.16.ffn_norm.weight | F32 | [4096]
model.layers.16.self_attn.k_proj.weight -> blk.16.attn_k.weight | F32 | [4096, 4096]
model.layers.16.self_attn.o_proj.weight -> blk.16.attn_output.weight | F32 | [4096, 4096]
model.layers.16.self_attn.q_proj.weight -> blk.16.attn_q.weight | F32 | [4096, 4096]
model.layers.16.self_attn.v_proj.weight -> blk.16.attn_v.weight | F32 | [4096, 4096]
model.layers.17.self_attn.k_proj.weight -> blk.17.attn_k.weight | F32 | [4096, 4096]
model.layers.17.self_attn.o_proj.weight -> blk.17.attn_output.weight | F32 | [4096, 4096]
model.layers.17.self_attn.q_proj.weight -> blk.17.attn_q.weight | F32 | [4096, 4096]
model.layers.17.self_attn.v_proj.weight -> blk.17.attn_v.weight | F32 | [4096, 4096]
model.layers.17.input_layernorm.weight -> blk.17.attn_norm.weight | F32 | [4096]
model.layers.17.mlp.down_proj.weight -> blk.17.ffn_down.weight | F32 | [4096, 11008]
model.layers.17.mlp.gate_proj.weight -> blk.17.ffn_gate.weight | F32 | [11008, 4096]
model.layers.17.mlp.up_proj.weight -> blk.17.ffn_up.weight | F32 | [11008, 4096]
model.layers.17.post_attention_layernorm.weight -> blk.17.ffn_norm.weight | F32 | [4096]
model.layers.18.input_layernorm.weight -> blk.18.attn_norm.weight | F32 | [4096]
model.layers.18.mlp.down_proj.weight -> blk.18.ffn_down.weight | F32 | [4096, 11008]
model.layers.18.mlp.gate_proj.weight -> blk.18.ffn_gate.weight | F32 | [11008, 4096]
model.layers.18.mlp.up_proj.weight -> blk.18.ffn_up.weight | F32 | [11008, 4096]
model.layers.18.post_attention_layernorm.weight -> blk.18.ffn_norm.weight | F32 | [4096]
model.layers.18.self_attn.k_proj.weight -> blk.18.attn_k.weight | F32 | [4096, 4096]
model.layers.18.self_attn.o_proj.weight -> blk.18.attn_output.weight | F32 | [4096, 4096]
model.layers.18.self_attn.q_proj.weight -> blk.18.attn_q.weight | F32 | [4096, 4096]
model.layers.18.self_attn.v_proj.weight -> blk.18.attn_v.weight | F32 | [4096, 4096]
model.layers.19.input_layernorm.weight -> blk.19.attn_norm.weight | F32 | [4096]
model.layers.19.mlp.down_proj.weight -> blk.19.ffn_down.weight | F32 | [4096, 11008]
model.layers.19.mlp.gate_proj.weight -> blk.19.ffn_gate.weight | F32 | [11008, 4096]
model.layers.19.mlp.up_proj.weight -> blk.19.ffn_up.weight | F32 | [11008, 4096]
model.layers.19.post_attention_layernorm.weight -> blk.19.ffn_norm.weight | F32 | [4096]
model.layers.19.self_attn.k_proj.weight -> blk.19.attn_k.weight | F32 | [4096, 4096]
model.layers.19.self_attn.o_proj.weight -> blk.19.attn_output.weight | F32 | [4096, 4096]
model.layers.19.self_attn.q_proj.weight -> blk.19.attn_q.weight | F32 | [4096, 4096]
model.layers.19.self_attn.v_proj.weight -> blk.19.attn_v.weight | F32 | [4096, 4096]
model.layers.20.input_layernorm.weight -> blk.20.attn_norm.weight | F32 | [4096]
model.layers.20.mlp.down_proj.weight -> blk.20.ffn_down.weight | F32 | [4096, 11008]
model.layers.20.mlp.gate_proj.weight -> blk.20.ffn_gate.weight | F32 | [11008, 4096]
model.layers.20.mlp.up_proj.weight -> blk.20.ffn_up.weight | F32 | [11008, 4096]
model.layers.20.post_attention_layernorm.weight -> blk.20.ffn_norm.weight | F32 | [4096]
model.layers.20.self_attn.k_proj.weight -> blk.20.attn_k.weight | F32 | [4096, 4096]
model.layers.20.self_attn.o_proj.weight -> blk.20.attn_output.weight | F32 | [4096, 4096]
model.layers.20.self_attn.q_proj.weight -> blk.20.attn_q.weight | F32 | [4096, 4096]
model.layers.20.self_attn.v_proj.weight -> blk.20.attn_v.weight | F32 | [4096, 4096]
model.layers.21.input_layernorm.weight -> blk.21.attn_norm.weight | F32 | [4096]
model.layers.21.mlp.down_proj.weight -> blk.21.ffn_down.weight | F32 | [4096, 11008]
model.layers.21.mlp.gate_proj.weight -> blk.21.ffn_gate.weight | F32 | [11008, 4096]
model.layers.21.mlp.up_proj.weight -> blk.21.ffn_up.weight | F32 | [11008, 4096]
model.layers.21.post_attention_layernorm.weight -> blk.21.ffn_norm.weight | F32 | [4096]
model.layers.21.self_attn.k_proj.weight -> blk.21.attn_k.weight | F32 | [4096, 4096]
model.layers.21.self_attn.o_proj.weight -> blk.21.attn_output.weight | F32 | [4096, 4096]
model.layers.21.self_attn.q_proj.weight -> blk.21.attn_q.weight | F32 | [4096, 4096]
model.layers.21.self_attn.v_proj.weight -> blk.21.attn_v.weight | F32 | [4096, 4096]
model.layers.22.input_layernorm.weight -> blk.22.attn_norm.weight | F32 | [4096]
model.layers.22.mlp.down_proj.weight -> blk.22.ffn_down.weight | F32 | [4096, 11008]
model.layers.22.mlp.gate_proj.weight -> blk.22.ffn_gate.weight | F32 | [11008, 4096]
model.layers.22.mlp.up_proj.weight -> blk.22.ffn_up.weight | F32 | [11008, 4096]
model.layers.22.post_attention_layernorm.weight -> blk.22.ffn_norm.weight | F32 | [4096]
model.layers.22.self_attn.k_proj.weight -> blk.22.attn_k.weight | F32 | [4096, 4096]
model.layers.22.self_attn.o_proj.weight -> blk.22.attn_output.weight | F32 | [4096, 4096]
model.layers.22.self_attn.q_proj.weight -> blk.22.attn_q.weight | F32 | [4096, 4096]
model.layers.22.self_attn.v_proj.weight -> blk.22.attn_v.weight | F32 | [4096, 4096]
model.layers.23.self_attn.k_proj.weight -> blk.23.attn_k.weight | F32 | [4096, 4096]
model.layers.23.self_attn.o_proj.weight -> blk.23.attn_output.weight | F32 | [4096, 4096]
model.layers.23.self_attn.q_proj.weight -> blk.23.attn_q.weight | F32 | [4096, 4096]
model.layers.23.self_attn.v_proj.weight -> blk.23.attn_v.weight | F32 | [4096, 4096]
model.layers.23.input_layernorm.weight -> blk.23.attn_norm.weight | F32 | [4096]
model.layers.23.mlp.down_proj.weight -> blk.23.ffn_down.weight | F32 | [4096, 11008]
model.layers.23.mlp.gate_proj.weight -> blk.23.ffn_gate.weight | F32 | [11008, 4096]
model.layers.23.mlp.up_proj.weight -> blk.23.ffn_up.weight | F32 | [11008, 4096]
model.layers.23.post_attention_layernorm.weight -> blk.23.ffn_norm.weight | F32 | [4096]
model.layers.24.input_layernorm.weight -> blk.24.attn_norm.weight | F32 | [4096]
model.layers.24.mlp.down_proj.weight -> blk.24.ffn_down.weight | F32 | [4096, 11008]
model.layers.24.mlp.gate_proj.weight -> blk.24.ffn_gate.weight | F32 | [11008, 4096]
model.layers.24.mlp.up_proj.weight -> blk.24.ffn_up.weight | F32 | [11008, 4096]
model.layers.24.post_attention_layernorm.weight -> blk.24.ffn_norm.weight | F32 | [4096]
model.layers.24.self_attn.k_proj.weight -> blk.24.attn_k.weight | F32 | [4096, 4096]
model.layers.24.self_attn.o_proj.weight -> blk.24.attn_output.weight | F32 | [4096, 4096]
model.layers.24.self_attn.q_proj.weight -> blk.24.attn_q.weight | F32 | [4096, 4096]
model.layers.24.self_attn.v_proj.weight -> blk.24.attn_v.weight | F32 | [4096, 4096]
model.layers.25.input_layernorm.weight -> blk.25.attn_norm.weight | F32 | [4096]
model.layers.25.mlp.down_proj.weight -> blk.25.ffn_down.weight | F32 | [4096, 11008]
model.layers.25.mlp.gate_proj.weight -> blk.25.ffn_gate.weight | F32 | [11008, 4096]
model.layers.25.mlp.up_proj.weight -> blk.25.ffn_up.weight | F32 | [11008, 4096]
model.layers.25.post_attention_layernorm.weight -> blk.25.ffn_norm.weight | F32 | [4096]
model.layers.25.self_attn.k_proj.weight -> blk.25.attn_k.weight | F32 | [4096, 4096]
model.layers.25.self_attn.o_proj.weight -> blk.25.attn_output.weight | F32 | [4096, 4096]
model.layers.25.self_attn.q_proj.weight -> blk.25.attn_q.weight | F32 | [4096, 4096]
model.layers.25.self_attn.v_proj.weight -> blk.25.attn_v.weight | F32 | [4096, 4096]
model.layers.26.input_layernorm.weight -> blk.26.attn_norm.weight | F32 | [4096]
model.layers.26.mlp.down_proj.weight -> blk.26.ffn_down.weight | F32 | [4096, 11008]
model.layers.26.mlp.gate_proj.weight -> blk.26.ffn_gate.weight | F32 | [11008, 4096]
model.layers.26.mlp.up_proj.weight -> blk.26.ffn_up.weight | F32 | [11008, 4096]
model.layers.26.post_attention_layernorm.weight -> blk.26.ffn_norm.weight | F32 | [4096]
model.layers.26.self_attn.k_proj.weight -> blk.26.attn_k.weight | F32 | [4096, 4096]
model.layers.26.self_attn.o_proj.weight -> blk.26.attn_output.weight | F32 | [4096, 4096]
model.layers.26.self_attn.q_proj.weight -> blk.26.attn_q.weight | F32 | [4096, 4096]
model.layers.26.self_attn.v_proj.weight -> blk.26.attn_v.weight | F32 | [4096, 4096]
model.layers.27.input_layernorm.weight -> blk.27.attn_norm.weight | F32 | [4096]
model.layers.27.mlp.down_proj.weight -> blk.27.ffn_down.weight | F32 | [4096, 11008]
model.layers.27.mlp.gate_proj.weight -> blk.27.ffn_gate.weight | F32 | [11008, 4096]
model.layers.27.mlp.up_proj.weight -> blk.27.ffn_up.weight | F32 | [11008, 4096]
model.layers.27.post_attention_layernorm.weight -> blk.27.ffn_norm.weight | F32 | [4096]
model.layers.27.self_attn.k_proj.weight -> blk.27.attn_k.weight | F32 | [4096, 4096]
model.layers.27.self_attn.o_proj.weight -> blk.27.attn_output.weight | F32 | [4096, 4096]
model.layers.27.self_attn.q_proj.weight -> blk.27.attn_q.weight | F32 | [4096, 4096]
model.layers.27.self_attn.v_proj.weight -> blk.27.attn_v.weight | F32 | [4096, 4096]
model.layers.28.input_layernorm.weight -> blk.28.attn_norm.weight | F32 | [4096]
model.layers.28.mlp.down_proj.weight -> blk.28.ffn_down.weight | F32 | [4096, 11008]
model.layers.28.mlp.gate_proj.weight -> blk.28.ffn_gate.weight | F32 | [11008, 4096]
model.layers.28.mlp.up_proj.weight -> blk.28.ffn_up.weight | F32 | [11008, 4096]
model.layers.28.post_attention_layernorm.weight -> blk.28.ffn_norm.weight | F32 | [4096]
model.layers.28.self_attn.k_proj.weight -> blk.28.attn_k.weight | F32 | [4096, 4096]
model.layers.28.self_attn.o_proj.weight -> blk.28.attn_output.weight | F32 | [4096, 4096]
model.layers.28.self_attn.q_proj.weight -> blk.28.attn_q.weight | F32 | [4096, 4096]
model.layers.28.self_attn.v_proj.weight -> blk.28.attn_v.weight | F32 | [4096, 4096]
model.layers.29.self_attn.k_proj.weight -> blk.29.attn_k.weight | F32 | [4096, 4096]
model.layers.29.self_attn.o_proj.weight -> blk.29.attn_output.weight | F32 | [4096, 4096]
model.layers.29.self_attn.q_proj.weight -> blk.29.attn_q.weight | F32 | [4096, 4096]
model.layers.29.self_attn.v_proj.weight -> blk.29.attn_v.weight | F32 | [4096, 4096]
lm_head.weight -> output.weight | F32 | [32256, 4096]
model.layers.29.input_layernorm.weight -> blk.29.attn_norm.weight | F32 | [4096]
model.layers.29.mlp.down_proj.weight -> blk.29.ffn_down.weight | F32 | [4096, 11008]
model.layers.29.mlp.gate_proj.weight -> blk.29.ffn_gate.weight | F32 | [11008, 4096]
model.layers.29.mlp.up_proj.weight -> blk.29.ffn_up.weight | F32 | [11008, 4096]
model.layers.29.post_attention_layernorm.weight -> blk.29.ffn_norm.weight | F32 | [4096]
model.layers.30.input_layernorm.weight -> blk.30.attn_norm.weight | F32 | [4096]
model.layers.30.mlp.down_proj.weight -> blk.30.ffn_down.weight | F32 | [4096, 11008]
model.layers.30.mlp.gate_proj.weight -> blk.30.ffn_gate.weight | F32 | [11008, 4096]
model.layers.30.mlp.up_proj.weight -> blk.30.ffn_up.weight | F32 | [11008, 4096]
model.layers.30.post_attention_layernorm.weight -> blk.30.ffn_norm.weight | F32 | [4096]
model.layers.30.self_attn.k_proj.weight -> blk.30.attn_k.weight | F32 | [4096, 4096]
model.layers.30.self_attn.o_proj.weight -> blk.30.attn_output.weight | F32 | [4096, 4096]
model.layers.30.self_attn.q_proj.weight -> blk.30.attn_q.weight | F32 | [4096, 4096]
model.layers.30.self_attn.v_proj.weight -> blk.30.attn_v.weight | F32 | [4096, 4096]
model.layers.31.input_layernorm.weight -> blk.31.attn_norm.weight | F32 | [4096]
model.layers.31.mlp.down_proj.weight -> blk.31.ffn_down.weight | F32 | [4096, 11008]
model.layers.31.mlp.gate_proj.weight -> blk.31.ffn_gate.weight | F32 | [11008, 4096]
model.layers.31.mlp.up_proj.weight -> blk.31.ffn_up.weight | F32 | [11008, 4096]
model.layers.31.post_attention_layernorm.weight -> blk.31.ffn_norm.weight | F32 | [4096]
model.layers.31.self_attn.k_proj.weight -> blk.31.attn_k.weight | F32 | [4096, 4096]
model.layers.31.self_attn.o_proj.weight -> blk.31.attn_output.weight | F32 | [4096, 4096]
model.layers.31.self_attn.q_proj.weight -> blk.31.attn_q.weight | F32 | [4096, 4096]
model.layers.31.self_attn.v_proj.weight -> blk.31.attn_v.weight | F32 | [4096, 4096]
model.norm.weight -> output_norm.weight | F32 | [4096]
Writing /Users/lawls/Development/models/Magicoder-S-DS-6.7B/ggml-model-f32.gguf, format 0
Traceback (most recent call last):
File "/Users/lawls/Development/python/llama.cpp/convert.py", line 1279, in <module>
main()
File "/Users/lawls/Development/python/llama.cpp/convert.py", line 1273, in main
OutputFile.write_all(outfile, ftype, params, model, vocab, special_vocab,
File "/Users/lawls/Development/python/llama.cpp/convert.py", line 988, in write_all
check_vocab_size(params, vocab, pad_vocab = pad_vocab)
File "/Users/lawls/Development/python/llama.cpp/convert.py", line 860, in check_vocab_size
raise Exception(msg)
Exception: Vocab size mismatch (model has 32256, but /Users/lawls/Development/models/Magicoder-S-DS-6.7B has 32022). Possibly try using the --padvocab option.
Using the --padvocab
option produces a .gguf
file. But whenever it try to load it, I get this error. I do not have this issue with the deepseek-coder-6.7b-base
. Beyond this, I have no idea what I am doing or what I would even do.
./server -m /Users/lawls/Development/models/Magicoder-S-DS-6.7B/ggml-model-f32.gguf --mlock
{"timestamp":1703169999,"level":"INFO","function":"main","line":2668,"message":"build info","build":1663,"commit":"799fc22"}
{"timestamp":1703169999,"level":"INFO","function":"main","line":2675,"message":"system info","n_threads":12,"n_threads_batch":-1,"total_threads":16,"system_info":"AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | "}
llama_model_loader: loaded meta data with 24 key-value pairs and 291 tensors from /Users/lawls/Development/models/Magicoder-S-DS-6.7B/ggml-model-f32.gguf (version GGUF V3 (latest))
llama_model_loader: - tensor 0: token_embd.weight f32 [ 4096, 32256, 1, 1 ]
llama_model_loader: - tensor 1: blk.0.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 2: blk.0.ffn_down.weight f32 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 3: blk.0.ffn_gate.weight f32 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 4: blk.0.ffn_up.weight f32 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 5: blk.0.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 6: blk.0.attn_k.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 7: blk.0.attn_output.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 8: blk.0.attn_q.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 9: blk.0.attn_v.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 10: blk.1.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 11: blk.1.ffn_down.weight f32 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 12: blk.1.ffn_gate.weight f32 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 13: blk.1.ffn_up.weight f32 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 14: blk.1.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 15: blk.1.attn_k.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 16: blk.1.attn_output.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 17: blk.1.attn_q.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 18: blk.1.attn_v.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 19: blk.2.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 20: blk.2.ffn_down.weight f32 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 21: blk.2.ffn_gate.weight f32 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 22: blk.2.ffn_up.weight f32 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 23: blk.2.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 24: blk.2.attn_k.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 25: blk.2.attn_output.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 26: blk.2.attn_q.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 27: blk.2.attn_v.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 28: blk.3.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 29: blk.3.ffn_down.weight f32 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 30: blk.3.ffn_gate.weight f32 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 31: blk.3.ffn_up.weight f32 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 32: blk.3.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 33: blk.3.attn_k.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 34: blk.3.attn_output.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 35: blk.3.attn_q.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 36: blk.3.attn_v.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 37: blk.4.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 38: blk.4.ffn_down.weight f32 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 39: blk.4.ffn_gate.weight f32 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 40: blk.4.ffn_up.weight f32 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 41: blk.4.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 42: blk.4.attn_k.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 43: blk.4.attn_output.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 44: blk.4.attn_q.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 45: blk.4.attn_v.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 46: blk.5.attn_k.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 47: blk.5.attn_output.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 48: blk.5.attn_q.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 49: blk.5.attn_v.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 50: blk.10.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 51: blk.10.ffn_down.weight f32 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 52: blk.10.ffn_gate.weight f32 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 53: blk.10.ffn_up.weight f32 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 54: blk.10.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 55: blk.10.attn_k.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 56: blk.10.attn_output.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 57: blk.10.attn_q.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 58: blk.10.attn_v.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 59: blk.11.attn_k.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 60: blk.11.attn_output.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 61: blk.11.attn_q.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 62: blk.11.attn_v.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 63: blk.5.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 64: blk.5.ffn_down.weight f32 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 65: blk.5.ffn_gate.weight f32 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 66: blk.5.ffn_up.weight f32 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 67: blk.5.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 68: blk.6.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 69: blk.6.ffn_down.weight f32 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 70: blk.6.ffn_gate.weight f32 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 71: blk.6.ffn_up.weight f32 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 72: blk.6.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 73: blk.6.attn_k.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 74: blk.6.attn_output.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 75: blk.6.attn_q.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 76: blk.6.attn_v.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 77: blk.7.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 78: blk.7.ffn_down.weight f32 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 79: blk.7.ffn_gate.weight f32 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 80: blk.7.ffn_up.weight f32 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 81: blk.7.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 82: blk.7.attn_k.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 83: blk.7.attn_output.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 84: blk.7.attn_q.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 85: blk.7.attn_v.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 86: blk.8.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 87: blk.8.ffn_down.weight f32 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 88: blk.8.ffn_gate.weight f32 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 89: blk.8.ffn_up.weight f32 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 90: blk.8.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 91: blk.8.attn_k.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 92: blk.8.attn_output.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 93: blk.8.attn_q.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 94: blk.8.attn_v.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 95: blk.9.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 96: blk.9.ffn_down.weight f32 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 97: blk.9.ffn_gate.weight f32 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 98: blk.9.ffn_up.weight f32 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 99: blk.9.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 100: blk.9.attn_k.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 101: blk.9.attn_output.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 102: blk.9.attn_q.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 103: blk.9.attn_v.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 104: blk.11.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 105: blk.11.ffn_down.weight f32 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 106: blk.11.ffn_gate.weight f32 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 107: blk.11.ffn_up.weight f32 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 108: blk.11.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 109: blk.12.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 110: blk.12.ffn_down.weight f32 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 111: blk.12.ffn_gate.weight f32 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 112: blk.12.ffn_up.weight f32 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 113: blk.12.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 114: blk.12.attn_k.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 115: blk.12.attn_output.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 116: blk.12.attn_q.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 117: blk.12.attn_v.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 118: blk.13.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 119: blk.13.ffn_down.weight f32 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 120: blk.13.ffn_gate.weight f32 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 121: blk.13.ffn_up.weight f32 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 122: blk.13.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 123: blk.13.attn_k.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 124: blk.13.attn_output.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 125: blk.13.attn_q.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 126: blk.13.attn_v.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 127: blk.14.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 128: blk.14.ffn_down.weight f32 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 129: blk.14.ffn_gate.weight f32 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 130: blk.14.ffn_up.weight f32 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 131: blk.14.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 132: blk.14.attn_k.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 133: blk.14.attn_output.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 134: blk.14.attn_q.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 135: blk.14.attn_v.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 136: blk.15.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 137: blk.15.ffn_down.weight f32 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 138: blk.15.ffn_gate.weight f32 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 139: blk.15.ffn_up.weight f32 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 140: blk.15.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 141: blk.15.attn_k.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 142: blk.15.attn_output.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 143: blk.15.attn_q.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 144: blk.15.attn_v.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 145: blk.16.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 146: blk.16.ffn_down.weight f32 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 147: blk.16.ffn_gate.weight f32 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 148: blk.16.ffn_up.weight f32 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 149: blk.16.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 150: blk.16.attn_k.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 151: blk.16.attn_output.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 152: blk.16.attn_q.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 153: blk.16.attn_v.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 154: blk.17.attn_k.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 155: blk.17.attn_output.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 156: blk.17.attn_q.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 157: blk.17.attn_v.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 158: blk.17.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 159: blk.17.ffn_down.weight f32 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 160: blk.17.ffn_gate.weight f32 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 161: blk.17.ffn_up.weight f32 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 162: blk.17.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 163: blk.18.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 164: blk.18.ffn_down.weight f32 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 165: blk.18.ffn_gate.weight f32 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 166: blk.18.ffn_up.weight f32 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 167: blk.18.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 168: blk.18.attn_k.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 169: blk.18.attn_output.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 170: blk.18.attn_q.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 171: blk.18.attn_v.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 172: blk.19.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 173: blk.19.ffn_down.weight f32 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 174: blk.19.ffn_gate.weight f32 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 175: blk.19.ffn_up.weight f32 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 176: blk.19.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 177: blk.19.attn_k.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 178: blk.19.attn_output.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 179: blk.19.attn_q.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 180: blk.19.attn_v.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 181: blk.20.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 182: blk.20.ffn_down.weight f32 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 183: blk.20.ffn_gate.weight f32 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 184: blk.20.ffn_up.weight f32 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 185: blk.20.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 186: blk.20.attn_k.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 187: blk.20.attn_output.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 188: blk.20.attn_q.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 189: blk.20.attn_v.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 190: blk.21.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 191: blk.21.ffn_down.weight f32 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 192: blk.21.ffn_gate.weight f32 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 193: blk.21.ffn_up.weight f32 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 194: blk.21.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 195: blk.21.attn_k.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 196: blk.21.attn_output.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 197: blk.21.attn_q.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 198: blk.21.attn_v.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 199: blk.22.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 200: blk.22.ffn_down.weight f32 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 201: blk.22.ffn_gate.weight f32 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 202: blk.22.ffn_up.weight f32 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 203: blk.22.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 204: blk.22.attn_k.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 205: blk.22.attn_output.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 206: blk.22.attn_q.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 207: blk.22.attn_v.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 208: blk.23.attn_k.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 209: blk.23.attn_output.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 210: blk.23.attn_q.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 211: blk.23.attn_v.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 212: blk.23.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 213: blk.23.ffn_down.weight f32 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 214: blk.23.ffn_gate.weight f32 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 215: blk.23.ffn_up.weight f32 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 216: blk.23.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 217: blk.24.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 218: blk.24.ffn_down.weight f32 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 219: blk.24.ffn_gate.weight f32 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 220: blk.24.ffn_up.weight f32 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 221: blk.24.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 222: blk.24.attn_k.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 223: blk.24.attn_output.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 224: blk.24.attn_q.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 225: blk.24.attn_v.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 226: blk.25.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 227: blk.25.ffn_down.weight f32 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 228: blk.25.ffn_gate.weight f32 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 229: blk.25.ffn_up.weight f32 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 230: blk.25.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 231: blk.25.attn_k.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 232: blk.25.attn_output.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 233: blk.25.attn_q.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 234: blk.25.attn_v.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 235: blk.26.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 236: blk.26.ffn_down.weight f32 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 237: blk.26.ffn_gate.weight f32 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 238: blk.26.ffn_up.weight f32 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 239: blk.26.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 240: blk.26.attn_k.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 241: blk.26.attn_output.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 242: blk.26.attn_q.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 243: blk.26.attn_v.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 244: blk.27.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 245: blk.27.ffn_down.weight f32 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 246: blk.27.ffn_gate.weight f32 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 247: blk.27.ffn_up.weight f32 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 248: blk.27.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 249: blk.27.attn_k.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 250: blk.27.attn_output.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 251: blk.27.attn_q.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 252: blk.27.attn_v.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 253: blk.28.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 254: blk.28.ffn_down.weight f32 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 255: blk.28.ffn_gate.weight f32 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 256: blk.28.ffn_up.weight f32 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 257: blk.28.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 258: blk.28.attn_k.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 259: blk.28.attn_output.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 260: blk.28.attn_q.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 261: blk.28.attn_v.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 262: blk.29.attn_k.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 263: blk.29.attn_output.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 264: blk.29.attn_q.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 265: blk.29.attn_v.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 266: output.weight f32 [ 4096, 32256, 1, 1 ]
llama_model_loader: - tensor 267: blk.29.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 268: blk.29.ffn_down.weight f32 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 269: blk.29.ffn_gate.weight f32 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 270: blk.29.ffn_up.weight f32 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 271: blk.29.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 272: blk.30.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 273: blk.30.ffn_down.weight f32 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 274: blk.30.ffn_gate.weight f32 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 275: blk.30.ffn_up.weight f32 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 276: blk.30.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 277: blk.30.attn_k.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 278: blk.30.attn_output.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 279: blk.30.attn_q.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 280: blk.30.attn_v.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 281: blk.31.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 282: blk.31.ffn_down.weight f32 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 283: blk.31.ffn_gate.weight f32 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 284: blk.31.ffn_up.weight f32 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 285: blk.31.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 286: blk.31.attn_k.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 287: blk.31.attn_output.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 288: blk.31.attn_q.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 289: blk.31.attn_v.weight f32 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 290: output_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1: general.name str = models
llama_model_loader: - kv 2: llama.context_length u32 = 16384
llama_model_loader: - kv 3: llama.embedding_length u32 = 4096
llama_model_loader: - kv 4: llama.block_count u32 = 32
llama_model_loader: - kv 5: llama.feed_forward_length u32 = 11008
llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 128
llama_model_loader: - kv 7: llama.attention.head_count u32 = 32
llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 32
llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000001
llama_model_loader: - kv 10: llama.rope.freq_base f32 = 100000.000000
llama_model_loader: - kv 11: llama.rope.scaling.type str = linear
llama_model_loader: - kv 12: llama.rope.scaling.factor f32 = 4.000000
llama_model_loader: - kv 13: general.file_type u32 = 0
llama_model_loader: - kv 14: tokenizer.ggml.model str = llama
llama_model_loader: - kv 15: tokenizer.ggml.tokens arr[str,32256] = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv 16: tokenizer.ggml.scores arr[f32,32256] = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv 17: tokenizer.ggml.token_type arr[i32,32256] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv 18: tokenizer.ggml.merges arr[str,31757] = ["Ġ Ġ", "Ġ t", "Ġ a", "i n", "h e...
llama_model_loader: - kv 19: tokenizer.ggml.bos_token_id u32 = 32013
llama_model_loader: - kv 20: tokenizer.ggml.eos_token_id u32 = 32014
llama_model_loader: - kv 21: tokenizer.ggml.padding_token_id u32 = 32014
llama_model_loader: - kv 22: tokenizer.ggml.add_bos_token bool = true
llama_model_loader: - kv 23: tokenizer.ggml.add_eos_token bool = false
llama_model_loader: - type f32: 291 tensors
error loading model: unordered_map::at: key not found
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model '/Users/lawls/Development/models/Magicoder-S-DS-6.7B/ggml-model-f32.gguf'
{"timestamp":1703169999,"level":"ERROR","function":"load_model","line":581,"message":"unable to load model","model":"/Users/lawls/Development/models/Magicoder-S-DS-6.7B/ggml-model-f32.gguf"}
Yeah, --pad-vocab doesn't help the situation: while it coverts fine, on inferring the model generates garbage. TheBloke's earlier qunats didn't work either - llama.cpp exits with a vocab related error. matthoffner/Magicoder-S-DS-6.7B-GGUF quants worked for me.