Fix vision_config.model_type

#21

Fixed the model_type error in vision_config. This PR allows the visual model to be loaded correctly instead of loading the entire GLM-4V model.

A simple reproduction script is as follows:

import torch
from transformers import Glm46VForConditionalGeneration


def main():
    model = Glm46VForConditionalGeneration.from_pretrained(
        "THUDM/GLM-4.1V-9B-Thinking",
        dtype="auto",
        device_map="auto",
    )

    visual = model.model.visual
    language = model.model.language_model

    print(f"type(model.model.visual): {type(visual)}")
    print(f"type(model.model.language_model): {type(language)}")

    assert visual.__class__.__name__ == "Glm4vVisionModel", (
        "vision_config mistakenly sets model_type='glm4v', so AutoModel.from_config "
        "instantiates the full Glm4vModel (visual + language) instead of the pure "
        "vision backbone Glm4vVisionModel."
    )


if __name__ == "__main__":
    main()

Hi @merve , please help review. Thanks!

This is an issue that was also fixed in the PR for GLM-4.6, which required renaming the model_type as shown in that PR. Therefore, this field will be updated when transformers 5.0.0 is released. For transformers 4.57.1, glm4_vision does not yet exist, so it will not be merged for now.

This is an issue that was also fixed in the PR for GLM-4.6, which required renaming the model_type as shown in that PR. Therefore, this field will be updated when transformers 5.0.0 is released. For transformers 4.57.1, glm4_vision does not yet exist, so it will not be merged for now.

Understood! Thanks for the clarification. I'll close this PR.

YangKai0616 changed pull request status to closed

Sign up or log in to comment