tokenizer.model seems to be empty?
Just 131 bytes - does not look right
tokenizer.model seems to be a git lfs link checked in as text file. this is clearly wrong.
it seems to be the same as llama2, atleast the hash it says it is , is the same.oid sha256:9e556afd44213b6bd1be2b850ebbbd98f5481437a8021afaf58ee7fb1818d347
https://huggingface.co/meta-llama/Llama-2-7b/blob/main/tokenizer.model
update: tested and works. it is indeed the llama2 tokenizer.model
edit: Q8_0 wikitest perplexity comes out to be 354.0950 +/- 2.40774
right now. which sounds about right.
f32 : 352.8210 +/- 2.39951
$ bin/main -m ../models/TinyLlama-1.1B-step-50K-105b/ggml-model-f32.gguf -t 10 -p "The meaning of life"
Log start
main: build = 1173 (e4386f4)
main: seed = 1693823704
llama_model_loader: loaded meta data with 17 key-value pairs and 201 tensors from ../models/TinyLlama-1.1B-step-50K-105b/ggml-model-f32.gguf (version GGUF V2 (latest))
llama_model_loader: - tensor 0: output.weight f32 [ 2048, 32000, 1, 1 ]
llama_model_loader: - tensor 1: token_embd.weight f32 [ 2048, 32000, 1, 1 ]
llama_model_loader: - tensor 2: blk.0.attn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 3: blk.0.attn_q.weight f32 [ 2048, 2048, 1, 1 ]
llama_model_loader: - tensor 4: blk.0.attn_k.weight f32 [ 2048, 256, 1, 1 ]
llama_model_loader: - tensor 5: blk.0.attn_v.weight f32 [ 2048, 256, 1, 1 ]
llama_model_loader: - tensor 6: blk.0.attn_output.weight f32 [ 2048, 2048, 1, 1 ]
llama_model_loader: - tensor 7: blk.0.ffn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 8: blk.0.ffn_gate.weight f32 [ 2048, 5632, 1, 1 ]
llama_model_loader: - tensor 9: blk.0.ffn_up.weight f32 [ 2048, 5632, 1, 1 ]
llama_model_loader: - tensor 10: blk.0.ffn_down.weight f32 [ 5632, 2048, 1, 1 ]
llama_model_loader: - tensor 11: blk.1.attn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 12: blk.1.attn_q.weight f32 [ 2048, 2048, 1, 1 ]
llama_model_loader: - tensor 13: blk.1.attn_k.weight f32 [ 2048, 256, 1, 1 ]
llama_model_loader: - tensor 14: blk.1.attn_v.weight f32 [ 2048, 256, 1, 1 ]
llama_model_loader: - tensor 15: blk.1.attn_output.weight f32 [ 2048, 2048, 1, 1 ]
llama_model_loader: - tensor 16: blk.1.ffn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 17: blk.1.ffn_gate.weight f32 [ 2048, 5632, 1, 1 ]
llama_model_loader: - tensor 18: blk.1.ffn_up.weight f32 [ 2048, 5632, 1, 1 ]
llama_model_loader: - tensor 19: blk.1.ffn_down.weight f32 [ 5632, 2048, 1, 1 ]
llama_model_loader: - tensor 20: blk.2.attn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 21: blk.2.attn_q.weight f32 [ 2048, 2048, 1, 1 ]
llama_model_loader: - tensor 22: blk.2.attn_k.weight f32 [ 2048, 256, 1, 1 ]
llama_model_loader: - tensor 23: blk.2.attn_v.weight f32 [ 2048, 256, 1, 1 ]
llama_model_loader: - tensor 24: blk.2.attn_output.weight f32 [ 2048, 2048, 1, 1 ]
llama_model_loader: - tensor 25: blk.2.ffn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 26: blk.2.ffn_gate.weight f32 [ 2048, 5632, 1, 1 ]
llama_model_loader: - tensor 27: blk.2.ffn_up.weight f32 [ 2048, 5632, 1, 1 ]
llama_model_loader: - tensor 28: blk.2.ffn_down.weight f32 [ 5632, 2048, 1, 1 ]
llama_model_loader: - tensor 29: blk.3.attn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 30: blk.3.attn_q.weight f32 [ 2048, 2048, 1, 1 ]
llama_model_loader: - tensor 31: blk.3.attn_k.weight f32 [ 2048, 256, 1, 1 ]
llama_model_loader: - tensor 32: blk.3.attn_v.weight f32 [ 2048, 256, 1, 1 ]
llama_model_loader: - tensor 33: blk.3.attn_output.weight f32 [ 2048, 2048, 1, 1 ]
llama_model_loader: - tensor 34: blk.3.ffn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 35: blk.3.ffn_gate.weight f32 [ 2048, 5632, 1, 1 ]
llama_model_loader: - tensor 36: blk.3.ffn_up.weight f32 [ 2048, 5632, 1, 1 ]
llama_model_loader: - tensor 37: blk.3.ffn_down.weight f32 [ 5632, 2048, 1, 1 ]
llama_model_loader: - tensor 38: blk.4.attn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 39: blk.4.attn_q.weight f32 [ 2048, 2048, 1, 1 ]
llama_model_loader: - tensor 40: blk.4.attn_k.weight f32 [ 2048, 256, 1, 1 ]
llama_model_loader: - tensor 41: blk.4.attn_v.weight f32 [ 2048, 256, 1, 1 ]
llama_model_loader: - tensor 42: blk.4.attn_output.weight f32 [ 2048, 2048, 1, 1 ]
llama_model_loader: - tensor 43: blk.4.ffn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 44: blk.4.ffn_gate.weight f32 [ 2048, 5632, 1, 1 ]
llama_model_loader: - tensor 45: blk.4.ffn_up.weight f32 [ 2048, 5632, 1, 1 ]
llama_model_loader: - tensor 46: blk.4.ffn_down.weight f32 [ 5632, 2048, 1, 1 ]
llama_model_loader: - tensor 47: blk.5.attn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 48: blk.5.attn_q.weight f32 [ 2048, 2048, 1, 1 ]
llama_model_loader: - tensor 49: blk.5.attn_k.weight f32 [ 2048, 256, 1, 1 ]
llama_model_loader: - tensor 50: blk.5.attn_v.weight f32 [ 2048, 256, 1, 1 ]
llama_model_loader: - tensor 51: blk.5.attn_output.weight f32 [ 2048, 2048, 1, 1 ]
llama_model_loader: - tensor 52: blk.5.ffn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 53: blk.5.ffn_gate.weight f32 [ 2048, 5632, 1, 1 ]
llama_model_loader: - tensor 54: blk.5.ffn_up.weight f32 [ 2048, 5632, 1, 1 ]
llama_model_loader: - tensor 55: blk.5.ffn_down.weight f32 [ 5632, 2048, 1, 1 ]
llama_model_loader: - tensor 56: blk.6.attn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 57: blk.6.attn_q.weight f32 [ 2048, 2048, 1, 1 ]
llama_model_loader: - tensor 58: blk.6.attn_k.weight f32 [ 2048, 256, 1, 1 ]
llama_model_loader: - tensor 59: blk.6.attn_v.weight f32 [ 2048, 256, 1, 1 ]
llama_model_loader: - tensor 60: blk.6.attn_output.weight f32 [ 2048, 2048, 1, 1 ]
llama_model_loader: - tensor 61: blk.6.ffn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 62: blk.6.ffn_gate.weight f32 [ 2048, 5632, 1, 1 ]
llama_model_loader: - tensor 63: blk.6.ffn_up.weight f32 [ 2048, 5632, 1, 1 ]
llama_model_loader: - tensor 64: blk.6.ffn_down.weight f32 [ 5632, 2048, 1, 1 ]
llama_model_loader: - tensor 65: blk.7.attn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 66: blk.7.attn_q.weight f32 [ 2048, 2048, 1, 1 ]
llama_model_loader: - tensor 67: blk.7.attn_k.weight f32 [ 2048, 256, 1, 1 ]
llama_model_loader: - tensor 68: blk.7.attn_v.weight f32 [ 2048, 256, 1, 1 ]
llama_model_loader: - tensor 69: blk.7.attn_output.weight f32 [ 2048, 2048, 1, 1 ]
llama_model_loader: - tensor 70: blk.7.ffn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 71: blk.7.ffn_gate.weight f32 [ 2048, 5632, 1, 1 ]
llama_model_loader: - tensor 72: blk.7.ffn_up.weight f32 [ 2048, 5632, 1, 1 ]
llama_model_loader: - tensor 73: blk.7.ffn_down.weight f32 [ 5632, 2048, 1, 1 ]
llama_model_loader: - tensor 74: blk.8.attn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 75: blk.8.attn_q.weight f32 [ 2048, 2048, 1, 1 ]
llama_model_loader: - tensor 76: blk.8.attn_k.weight f32 [ 2048, 256, 1, 1 ]
llama_model_loader: - tensor 77: blk.8.attn_v.weight f32 [ 2048, 256, 1, 1 ]
llama_model_loader: - tensor 78: blk.8.attn_output.weight f32 [ 2048, 2048, 1, 1 ]
llama_model_loader: - tensor 79: blk.8.ffn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 80: blk.8.ffn_gate.weight f32 [ 2048, 5632, 1, 1 ]
llama_model_loader: - tensor 81: blk.8.ffn_up.weight f32 [ 2048, 5632, 1, 1 ]
llama_model_loader: - tensor 82: blk.8.ffn_down.weight f32 [ 5632, 2048, 1, 1 ]
llama_model_loader: - tensor 83: blk.9.attn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 84: blk.9.attn_q.weight f32 [ 2048, 2048, 1, 1 ]
llama_model_loader: - tensor 85: blk.9.attn_k.weight f32 [ 2048, 256, 1, 1 ]
llama_model_loader: - tensor 86: blk.9.attn_v.weight f32 [ 2048, 256, 1, 1 ]
llama_model_loader: - tensor 87: blk.9.attn_output.weight f32 [ 2048, 2048, 1, 1 ]
llama_model_loader: - tensor 88: blk.9.ffn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 89: blk.9.ffn_gate.weight f32 [ 2048, 5632, 1, 1 ]
llama_model_loader: - tensor 90: blk.9.ffn_up.weight f32 [ 2048, 5632, 1, 1 ]
llama_model_loader: - tensor 91: blk.9.ffn_down.weight f32 [ 5632, 2048, 1, 1 ]
llama_model_loader: - tensor 92: blk.10.attn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 93: blk.10.attn_q.weight f32 [ 2048, 2048, 1, 1 ]
llama_model_loader: - tensor 94: blk.10.attn_k.weight f32 [ 2048, 256, 1, 1 ]
llama_model_loader: - tensor 95: blk.10.attn_v.weight f32 [ 2048, 256, 1, 1 ]
llama_model_loader: - tensor 96: blk.10.attn_output.weight f32 [ 2048, 2048, 1, 1 ]
llama_model_loader: - tensor 97: blk.10.ffn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 98: blk.10.ffn_gate.weight f32 [ 2048, 5632, 1, 1 ]
llama_model_loader: - tensor 99: blk.10.ffn_up.weight f32 [ 2048, 5632, 1, 1 ]
llama_model_loader: - tensor 100: blk.10.ffn_down.weight f32 [ 5632, 2048, 1, 1 ]
llama_model_loader: - tensor 101: blk.11.attn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 102: blk.11.attn_q.weight f32 [ 2048, 2048, 1, 1 ]
llama_model_loader: - tensor 103: blk.11.attn_k.weight f32 [ 2048, 256, 1, 1 ]
llama_model_loader: - tensor 104: blk.11.attn_v.weight f32 [ 2048, 256, 1, 1 ]
llama_model_loader: - tensor 105: blk.11.attn_output.weight f32 [ 2048, 2048, 1, 1 ]
llama_model_loader: - tensor 106: blk.11.ffn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 107: blk.11.ffn_gate.weight f32 [ 2048, 5632, 1, 1 ]
llama_model_loader: - tensor 108: blk.11.ffn_up.weight f32 [ 2048, 5632, 1, 1 ]
llama_model_loader: - tensor 109: blk.11.ffn_down.weight f32 [ 5632, 2048, 1, 1 ]
llama_model_loader: - tensor 110: blk.12.attn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 111: blk.12.attn_q.weight f32 [ 2048, 2048, 1, 1 ]
llama_model_loader: - tensor 112: blk.12.attn_k.weight f32 [ 2048, 256, 1, 1 ]
llama_model_loader: - tensor 113: blk.12.attn_v.weight f32 [ 2048, 256, 1, 1 ]
llama_model_loader: - tensor 114: blk.12.attn_output.weight f32 [ 2048, 2048, 1, 1 ]
llama_model_loader: - tensor 115: blk.12.ffn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 116: blk.12.ffn_gate.weight f32 [ 2048, 5632, 1, 1 ]
llama_model_loader: - tensor 117: blk.12.ffn_up.weight f32 [ 2048, 5632, 1, 1 ]
llama_model_loader: - tensor 118: blk.12.ffn_down.weight f32 [ 5632, 2048, 1, 1 ]
llama_model_loader: - tensor 119: blk.13.attn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 120: blk.13.attn_q.weight f32 [ 2048, 2048, 1, 1 ]
llama_model_loader: - tensor 121: blk.13.attn_k.weight f32 [ 2048, 256, 1, 1 ]
llama_model_loader: - tensor 122: blk.13.attn_v.weight f32 [ 2048, 256, 1, 1 ]
llama_model_loader: - tensor 123: blk.13.attn_output.weight f32 [ 2048, 2048, 1, 1 ]
llama_model_loader: - tensor 124: blk.13.ffn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 125: blk.13.ffn_gate.weight f32 [ 2048, 5632, 1, 1 ]
llama_model_loader: - tensor 126: blk.13.ffn_up.weight f32 [ 2048, 5632, 1, 1 ]
llama_model_loader: - tensor 127: blk.13.ffn_down.weight f32 [ 5632, 2048, 1, 1 ]
llama_model_loader: - tensor 128: blk.14.attn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 129: blk.14.attn_q.weight f32 [ 2048, 2048, 1, 1 ]
llama_model_loader: - tensor 130: blk.14.attn_k.weight f32 [ 2048, 256, 1, 1 ]
llama_model_loader: - tensor 131: blk.14.attn_v.weight f32 [ 2048, 256, 1, 1 ]
llama_model_loader: - tensor 132: blk.14.attn_output.weight f32 [ 2048, 2048, 1, 1 ]
llama_model_loader: - tensor 133: blk.14.ffn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 134: blk.14.ffn_gate.weight f32 [ 2048, 5632, 1, 1 ]
llama_model_loader: - tensor 135: blk.14.ffn_up.weight f32 [ 2048, 5632, 1, 1 ]
llama_model_loader: - tensor 136: blk.14.ffn_down.weight f32 [ 5632, 2048, 1, 1 ]
llama_model_loader: - tensor 137: blk.15.attn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 138: blk.15.attn_q.weight f32 [ 2048, 2048, 1, 1 ]
llama_model_loader: - tensor 139: blk.15.attn_k.weight f32 [ 2048, 256, 1, 1 ]
llama_model_loader: - tensor 140: blk.15.attn_v.weight f32 [ 2048, 256, 1, 1 ]
llama_model_loader: - tensor 141: blk.15.attn_output.weight f32 [ 2048, 2048, 1, 1 ]
llama_model_loader: - tensor 142: blk.15.ffn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 143: blk.15.ffn_gate.weight f32 [ 2048, 5632, 1, 1 ]
llama_model_loader: - tensor 144: blk.15.ffn_up.weight f32 [ 2048, 5632, 1, 1 ]
llama_model_loader: - tensor 145: blk.15.ffn_down.weight f32 [ 5632, 2048, 1, 1 ]
llama_model_loader: - tensor 146: blk.16.attn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 147: blk.16.attn_q.weight f32 [ 2048, 2048, 1, 1 ]
llama_model_loader: - tensor 148: blk.16.attn_k.weight f32 [ 2048, 256, 1, 1 ]
llama_model_loader: - tensor 149: blk.16.attn_v.weight f32 [ 2048, 256, 1, 1 ]
llama_model_loader: - tensor 150: blk.16.attn_output.weight f32 [ 2048, 2048, 1, 1 ]
llama_model_loader: - tensor 151: blk.16.ffn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 152: blk.16.ffn_gate.weight f32 [ 2048, 5632, 1, 1 ]
llama_model_loader: - tensor 153: blk.16.ffn_up.weight f32 [ 2048, 5632, 1, 1 ]
llama_model_loader: - tensor 154: blk.16.ffn_down.weight f32 [ 5632, 2048, 1, 1 ]
llama_model_loader: - tensor 155: blk.17.attn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 156: blk.17.attn_q.weight f32 [ 2048, 2048, 1, 1 ]
llama_model_loader: - tensor 157: blk.17.attn_k.weight f32 [ 2048, 256, 1, 1 ]
llama_model_loader: - tensor 158: blk.17.attn_v.weight f32 [ 2048, 256, 1, 1 ]
llama_model_loader: - tensor 159: blk.17.attn_output.weight f32 [ 2048, 2048, 1, 1 ]
llama_model_loader: - tensor 160: blk.17.ffn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 161: blk.17.ffn_gate.weight f32 [ 2048, 5632, 1, 1 ]
llama_model_loader: - tensor 162: blk.17.ffn_up.weight f32 [ 2048, 5632, 1, 1 ]
llama_model_loader: - tensor 163: blk.17.ffn_down.weight f32 [ 5632, 2048, 1, 1 ]
llama_model_loader: - tensor 164: blk.18.attn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 165: blk.18.attn_q.weight f32 [ 2048, 2048, 1, 1 ]
llama_model_loader: - tensor 166: blk.18.attn_k.weight f32 [ 2048, 256, 1, 1 ]
llama_model_loader: - tensor 167: blk.18.attn_v.weight f32 [ 2048, 256, 1, 1 ]
llama_model_loader: - tensor 168: blk.18.attn_output.weight f32 [ 2048, 2048, 1, 1 ]
llama_model_loader: - tensor 169: blk.18.ffn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 170: blk.18.ffn_gate.weight f32 [ 2048, 5632, 1, 1 ]
llama_model_loader: - tensor 171: blk.18.ffn_up.weight f32 [ 2048, 5632, 1, 1 ]
llama_model_loader: - tensor 172: blk.18.ffn_down.weight f32 [ 5632, 2048, 1, 1 ]
llama_model_loader: - tensor 173: blk.19.attn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 174: blk.19.attn_q.weight f32 [ 2048, 2048, 1, 1 ]
llama_model_loader: - tensor 175: blk.19.attn_k.weight f32 [ 2048, 256, 1, 1 ]
llama_model_loader: - tensor 176: blk.19.attn_v.weight f32 [ 2048, 256, 1, 1 ]
llama_model_loader: - tensor 177: blk.19.attn_output.weight f32 [ 2048, 2048, 1, 1 ]
llama_model_loader: - tensor 178: blk.19.ffn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 179: blk.19.ffn_gate.weight f32 [ 2048, 5632, 1, 1 ]
llama_model_loader: - tensor 180: blk.19.ffn_up.weight f32 [ 2048, 5632, 1, 1 ]
llama_model_loader: - tensor 181: blk.19.ffn_down.weight f32 [ 5632, 2048, 1, 1 ]
llama_model_loader: - tensor 182: blk.20.attn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 183: blk.20.attn_q.weight f32 [ 2048, 2048, 1, 1 ]
llama_model_loader: - tensor 184: blk.20.attn_k.weight f32 [ 2048, 256, 1, 1 ]
llama_model_loader: - tensor 185: blk.20.attn_v.weight f32 [ 2048, 256, 1, 1 ]
llama_model_loader: - tensor 186: blk.20.attn_output.weight f32 [ 2048, 2048, 1, 1 ]
llama_model_loader: - tensor 187: blk.20.ffn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 188: blk.20.ffn_gate.weight f32 [ 2048, 5632, 1, 1 ]
llama_model_loader: - tensor 189: blk.20.ffn_up.weight f32 [ 2048, 5632, 1, 1 ]
llama_model_loader: - tensor 190: blk.20.ffn_down.weight f32 [ 5632, 2048, 1, 1 ]
llama_model_loader: - tensor 191: blk.21.attn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 192: blk.21.attn_q.weight f32 [ 2048, 2048, 1, 1 ]
llama_model_loader: - tensor 193: blk.21.attn_k.weight f32 [ 2048, 256, 1, 1 ]
llama_model_loader: - tensor 194: blk.21.attn_v.weight f32 [ 2048, 256, 1, 1 ]
llama_model_loader: - tensor 195: blk.21.attn_output.weight f32 [ 2048, 2048, 1, 1 ]
llama_model_loader: - tensor 196: blk.21.ffn_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - tensor 197: blk.21.ffn_gate.weight f32 [ 2048, 5632, 1, 1 ]
llama_model_loader: - tensor 198: blk.21.ffn_up.weight f32 [ 2048, 5632, 1, 1 ]
llama_model_loader: - tensor 199: blk.21.ffn_down.weight f32 [ 5632, 2048, 1, 1 ]
llama_model_loader: - tensor 200: output_norm.weight f32 [ 2048, 1, 1, 1 ]
llama_model_loader: - kv 0: general.architecture str
llama_model_loader: - kv 1: general.name str
llama_model_loader: - kv 2: llama.context_length u32
llama_model_loader: - kv 3: llama.embedding_length u32
llama_model_loader: - kv 4: llama.block_count u32
llama_model_loader: - kv 5: llama.feed_forward_length u32
llama_model_loader: - kv 6: llama.rope.dimension_count u32
llama_model_loader: - kv 7: llama.attention.head_count u32
llama_model_loader: - kv 8: llama.attention.head_count_kv u32
llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32
llama_model_loader: - kv 10: tokenizer.ggml.model str
llama_model_loader: - kv 11: tokenizer.ggml.tokens arr
llama_model_loader: - kv 12: tokenizer.ggml.scores arr
llama_model_loader: - kv 13: tokenizer.ggml.token_type arr
llama_model_loader: - kv 14: tokenizer.ggml.bos_token_id u32
llama_model_loader: - kv 15: tokenizer.ggml.eos_token_id u32
llama_model_loader: - kv 16: tokenizer.ggml.unknown_token_id u32
llama_model_loader: - type f32: 201 tensors
llm_load_print_meta: format = GGUF V2 (latest)
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = SPM
llm_load_print_meta: n_vocab = 32000
llm_load_print_meta: n_merges = 0
llm_load_print_meta: n_ctx_train = 2048
llm_load_print_meta: n_ctx = 512
llm_load_print_meta: n_embd = 2048
llm_load_print_meta: n_head = 32
llm_load_print_meta: n_head_kv = 4
llm_load_print_meta: n_layer = 22
llm_load_print_meta: n_rot = 64
llm_load_print_meta: n_gqa = 8
llm_load_print_meta: f_norm_eps = 1,0e-05
llm_load_print_meta: f_norm_rms_eps = 1,0e-05
llm_load_print_meta: n_ff = 5632
llm_load_print_meta: freq_base = 10000,0
llm_load_print_meta: freq_scale = 1
llm_load_print_meta: model type = ?B
llm_load_print_meta: model ftype = all F32 (guessed)
llm_load_print_meta: model size = 1,10 B
llm_load_print_meta: general.name = models
llm_load_print_meta: BOS token = 1 '<s>'
llm_load_print_meta: EOS token = 2 '</s>'
llm_load_print_meta: UNK token = 0 '<unk>'
llm_load_print_meta: LF token = 13 '<0x0A>'
llm_load_tensors: ggml ctx size = 0,06 MB
llm_load_tensors: mem required = 4196,42 MB (+ 11,00 MB per state)
...........................................................................................
llama_new_context_with_model: kv self size = 11,00 MB
llama_new_context_with_model: compute buffer total size = 67,97 MB
system_info: n_threads = 10 / 24 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 |
sampling: repeat_last_n = 64, repeat_penalty = 1,100000, presence_penalty = 0,000000, frequency_penalty = 0,000000, top_k = 40, tfs_z = 1,000000, top_p = 0,950000, typical_p = 1,000000, temp = 0,800000, mirostat = 0, mirostat_lr = 0,100000, mirostat_ent = 5,000000
generate: n_ctx = 512, n_batch = 512, n_predict = -1, n_keep = 0
The meaning of life and the meaning to be to be able to exist. I can say to have any knowledge and has the the opportunity to come in the experience a few days, a little time as long as far.
the most probably was to me. The most and the most difficult one of in, you. To have been with them, the next that it is not a lot on the fact so and would think the more than the other is
Thanks for spotting this! I guess HuggingFace might substitute the tokenizer with a softlink for less storage. I can successfully load with AutoModel.from_pretrained(). But it might not work with git clone. Will update the file!
I have uploaded the correct tokenizer. Will close this issue. Feel free to reopen it if the problem persists.