WizardLMTeam/WizardCoder-Python-34B-V1.0

mklf

Sep 12, 2023

•

edited Sep 12, 2023

WizardCoder-Python-34B-V1.0 was trained by transformers 4.31.0. In transformers 4.31.0, rope_theta was not used in initializing RotaryEmbedding, so the default value for base parameter is used here. Higher version of transformers(4.33.0 is tested) passes rope_theta to the base parameter, which is 1000000 in the config file

initialize in 4.31.0:

    def _init_rope(self):
        if self.config.rope_scaling is None:
            self.rotary_emb = LlamaRotaryEmbedding(self.head_dim, max_position_embeddings=self.max_position_embeddings)
        else:
            scaling_type = self.config.rope_scaling["type"]
            scaling_factor = self.config.rope_scaling["factor"]
            if scaling_type == "linear":
                self.rotary_emb = LlamaLinearScalingRotaryEmbedding(
                    self.head_dim, max_position_embeddings=self.max_position_embeddings, scaling_factor=scaling_factor
                )
            elif scaling_type == "dynamic":
                self.rotary_emb = LlamaDynamicNTKScalingRotaryEmbedding(
                    self.head_dim, max_position_embeddings=self.max_position_embeddings, scaling_factor=scaling_factor
                )
            else:
                raise ValueError(f"Unknown RoPE scaling type {scaling_type}")

default base value is 10000:

class LlamaRotaryEmbedding(torch.nn.Module):
    def __init__(self, dim, max_position_embeddings=2048, base=10000, device=None):
            ...

Update config.json9201d240

WizardLM changed pull request status to closed Sep 14, 2023