Update config.json
#24
by
mklf
- opened
WizardCoder-Python-34B-V1.0 was trained by transformers 4.31.0. In transformers 4.31.0, rope_theta
was not used in initializing RotaryEmbedding
, so the default value for base
parameter is used here. Higher version of transformers(4.33.0 is tested) passes rope_theta
to the base
parameter, which is 1000000
in the config file
initialize in 4.31.0:
def _init_rope(self):
if self.config.rope_scaling is None:
self.rotary_emb = LlamaRotaryEmbedding(self.head_dim, max_position_embeddings=self.max_position_embeddings)
else:
scaling_type = self.config.rope_scaling["type"]
scaling_factor = self.config.rope_scaling["factor"]
if scaling_type == "linear":
self.rotary_emb = LlamaLinearScalingRotaryEmbedding(
self.head_dim, max_position_embeddings=self.max_position_embeddings, scaling_factor=scaling_factor
)
elif scaling_type == "dynamic":
self.rotary_emb = LlamaDynamicNTKScalingRotaryEmbedding(
self.head_dim, max_position_embeddings=self.max_position_embeddings, scaling_factor=scaling_factor
)
else:
raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
default base value is 10000
:
class LlamaRotaryEmbedding(torch.nn.Module):
def __init__(self, dim, max_position_embeddings=2048, base=10000, device=None):
...
WizardLM
changed pull request status to
closed