Missing steps

by Martins6 - opened Oct 23, 2024

Discussion

Martins6

Oct 23, 2024

•

edited Oct 23, 2024

Thanks for the awesome project and sharing of the weights! You guys rock!

On the llava module on the load_pretrained_model function, it has the following line:

model = LlavaQwenForCausalLM.from_pretrained(model_path, low_cpu_mem_usage=True, attn_implementation=attn_implementation, **kwargs)

However, there's no class that it is calling. I know this may be a llava problem, but maybe you guys can point a solution? Otherwise, it seems your code is currently unusable..

Martins6 changed discussion status to closed Oct 23, 2024

Martins6

Oct 23, 2024

This comment has been hidden

Martins6

Oct 24, 2024

•

edited Oct 25, 2024

Guys, so I inspected it further. It seems there's just some missing steps

First, I had to install all of this packages, it would be nice to document this:

"accelerate>=1.0.1",
"av>=13.1.0",
"boto3>=1.35.46",
"decord>=0.6.0",
"einops>=0.6.0",
"flash-attn",
"llava",
"open-clip-torch>=2.28.0",
"transformers>=4.45.2",

Second, the load_pretrained_model function was simply stopping to work when loading the Qwen model.
I had to create a new function to load everything that was necessary:

def load_model():
    model_name = "llava_qwen"
    device_map = "auto"

    model_path = "lmms-lab/LLaVA-Video-7B-Qwen2"
    attn_implementation = None  # "flash_attention_2"
    kwargs = {"device_map": "auto", "torch_dtype": torch.float16}

    tokenizer = AutoTokenizer.from_pretrained(model_path)

    model = LlavaQwenForCausalLM.from_pretrained(
        model_path,
        low_cpu_mem_usage=True,
        attn_implementation=attn_implementation,
        **kwargs,
    )

    if "llava" in model_name.lower():
        mm_use_im_start_end = getattr(model.config, "mm_use_im_start_end", False)
        mm_use_im_patch_token = getattr(model.config, "mm_use_im_patch_token", True)
        if mm_use_im_patch_token:
            tokenizer.add_tokens([DEFAULT_IMAGE_PATCH_TOKEN], special_tokens=True)
        if mm_use_im_start_end:
            tokenizer.add_tokens(
                [DEFAULT_IM_START_TOKEN, DEFAULT_IM_END_TOKEN], special_tokens=True
            )
        model.resize_token_embeddings(len(tokenizer))

        vision_tower = model.get_vision_tower()
        if not vision_tower.is_loaded:
            vision_tower.load_model(device_map=device_map)
        if device_map != "auto":
            vision_tower.to(device="cuda", dtype=torch.float16)
        image_processor = vision_tower.image_processor

    return model, tokenizer, image_processor

Martins6 changed discussion status to open Oct 24, 2024

Martins6 changed discussion title from LlavaQwenForCausalLM not found. to tokenizer_image_token not working Oct 24, 2024

Martins6 changed discussion title from tokenizer_image_token not working to Missing steps Oct 24, 2024

RachelZhou

Oct 24, 2024

Hi, I successfully ran the inference code with the 7B model, but encountered an issue when switching to the 32B model. Have you experienced any problems running the 32B model?

Martins6

Oct 25, 2024

hey @RachelZhou , I don't have enough compute to test that :/
But if I do I'll report back to you! Hope that tips I gave here, may help you out on your project

Martins6

Nov 16, 2024

hey @RachelZhou , I did try it, and got some buggy results too. Don't have the Traceback unfortunately. But 72B model runs super smoothly! hope it helps!

negsadr

Nov 18, 2024

I could run the original code once I ensured flash-attn was successfully installed!

RachelZhou

Nov 20, 2024

Thank you for sharing your experience!!

RachelZhou

Dec 7, 2024

Hi @Martins6 I’m currently using an NVIDIA A100-SXM4-40GB GPU to run the 72B model but have found it insufficient for the task. I’m curious to know which GPU(s) or resources you are using for this model.

Martins6

Dec 7, 2024

AWS EC2 with instance g5.48xlarge! G5 instances feature up to 8 NVIDIA A10G Tensor Core GPUs and second generation AMD EPYC processors.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment