ROPE frequency

#1
by Silver267 - opened

The transformers repo suggested that this model has a ROPE frequency of 1,000,000. However, there are no "qwen2.rope.freq_base" value in the metadata according to gguf-dump.

Output of gguf-dump:

* Loading: qwen1_5-72b-chat-q5_k_m.gguf
* File is LITTLE endian, script is running on a LITTLE endian host.

* Dumping 23 key/value pair(s)
      1: UINT32     |        1 | GGUF.version = 3
      2: UINT64     |        1 | GGUF.tensor_count = 963
      3: UINT64     |        1 | GGUF.kv_count = 20
      4: STRING     |        1 | general.architecture = 'qwen2'
      5: STRING     |        1 | general.name = 'Qwen2-beta-72B-Chat'
      6: UINT32     |        1 | qwen2.block_count = 80
      7: UINT32     |        1 | qwen2.context_length = 32768
      8: UINT32     |        1 | qwen2.embedding_length = 8192
      9: UINT32     |        1 | qwen2.feed_forward_length = 24576
     10: UINT32     |        1 | qwen2.attention.head_count = 64
     11: UINT32     |        1 | qwen2.attention.head_count_kv = 64
     12: FLOAT32    |        1 | qwen2.attention.layer_norm_rms_epsilon = 9.999999974752427e-07
     13: BOOL       |        1 | qwen2.use_parallel_residual = True
     14: STRING     |        1 | tokenizer.ggml.model = 'gpt2'
     15: [STRING]   |   152064 | tokenizer.ggml.tokens
     16: [INT32]    |   152064 | tokenizer.ggml.token_type
     17: [STRING]   |   151387 | tokenizer.ggml.merges
     18: UINT32     |        1 | tokenizer.ggml.eos_token_id = 151643
     19: UINT32     |        1 | tokenizer.ggml.padding_token_id = 151643
     20: UINT32     |        1 | tokenizer.ggml.bos_token_id = 151643
     21: STRING     |        1 | tokenizer.chat_template = "{% for message in messages %}{{'<|im_start|>' + message['rol"
     22: UINT32     |        1 | general.quantization_version = 2
     23: UINT32     |        1 | general.file_type = 17

Yeah, the v1.5 models you can pull from https://ollama.ai/library/qwen are missing their ROPE frequency too.

I've patched my Ollama to allow the setting of rope_frequency_base in the modelfile again, so I can fix this via:

PARAMETER rope_frequency_base 1000000

but it should also be possible to use gguf-set-metadata to do the same.

I can confirm this does seem to work as without this setting it just ends up outputting repeating newlines after a while - possibly because the default is 10000 (?) and it will make the context 'appear' to fill up 100x quicker to the model. Hopefully this gets fixed soon as I bet a lot of people are running into this problem ( @TheBloke or @LoneStriker should hopefully soon upload a version with the correct value baked in).

Qwen org

Yes, I have fixed this. I am now also asking ollama to follow my setup.

Sign up or log in to comment