Fix vision_config.model_type
Fixed the model_type error in vision_config. This PR allows the visual model to be loaded correctly instead of loading the entire GLM-4V model.
A simple reproduction script is as follows:
import torch
from transformers import Glm46VForConditionalGeneration
def main():
model = Glm46VForConditionalGeneration.from_pretrained(
"THUDM/GLM-4.1V-9B-Thinking",
dtype="auto",
device_map="auto",
)
visual = model.model.visual
language = model.model.language_model
print(f"type(model.model.visual): {type(visual)}")
print(f"type(model.model.language_model): {type(language)}")
assert visual.__class__.__name__ == "Glm4vVisionModel", (
"vision_config mistakenly sets model_type='glm4v', so AutoModel.from_config "
"instantiates the full Glm4vModel (visual + language) instead of the pure "
"vision backbone Glm4vVisionModel."
)
if __name__ == "__main__":
main()
This is an issue that was also fixed in the PR for GLM-4.6, which required renaming the model_type as shown in that PR. Therefore, this field will be updated when transformers 5.0.0 is released. For transformers 4.57.1, glm4_vision does not yet exist, so it will not be merged for now.
This is an issue that was also fixed in the PR for GLM-4.6, which required renaming the model_type as shown in that PR. Therefore, this field will be updated when transformers 5.0.0 is released. For transformers 4.57.1, glm4_vision does not yet exist, so it will not be merged for now.
Understood! Thanks for the clarification. I'll close this PR.