Missing steps
Thanks for the awesome project and sharing of the weights! You guys rock!
On the llava
module on the load_pretrained_model
function, it has the following line:
model = LlavaQwenForCausalLM.from_pretrained(model_path, low_cpu_mem_usage=True, attn_implementation=attn_implementation, **kwargs)
However, there's no class that it is calling. I know this may be a llava
problem, but maybe you guys can point a solution? Otherwise, it seems your code is currently unusable..
Guys, so I inspected it further. It seems there's just some missing steps
First, I had to install all of this packages, it would be nice to document this:
"accelerate>=1.0.1",
"av>=13.1.0",
"boto3>=1.35.46",
"decord>=0.6.0",
"einops>=0.6.0",
"flash-attn",
"llava",
"open-clip-torch>=2.28.0",
"transformers>=4.45.2",
Second, the load_pretrained_model
function was simply stopping to work when loading the Qwen model.
I had to create a new function to load everything that was necessary:
def load_model():
model_name = "llava_qwen"
device_map = "auto"
model_path = "lmms-lab/LLaVA-Video-7B-Qwen2"
attn_implementation = None # "flash_attention_2"
kwargs = {"device_map": "auto", "torch_dtype": torch.float16}
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = LlavaQwenForCausalLM.from_pretrained(
model_path,
low_cpu_mem_usage=True,
attn_implementation=attn_implementation,
**kwargs,
)
if "llava" in model_name.lower():
mm_use_im_start_end = getattr(model.config, "mm_use_im_start_end", False)
mm_use_im_patch_token = getattr(model.config, "mm_use_im_patch_token", True)
if mm_use_im_patch_token:
tokenizer.add_tokens([DEFAULT_IMAGE_PATCH_TOKEN], special_tokens=True)
if mm_use_im_start_end:
tokenizer.add_tokens(
[DEFAULT_IM_START_TOKEN, DEFAULT_IM_END_TOKEN], special_tokens=True
)
model.resize_token_embeddings(len(tokenizer))
vision_tower = model.get_vision_tower()
if not vision_tower.is_loaded:
vision_tower.load_model(device_map=device_map)
if device_map != "auto":
vision_tower.to(device="cuda", dtype=torch.float16)
image_processor = vision_tower.image_processor
return model, tokenizer, image_processor
Hi, I successfully ran the inference code with the 7B model, but encountered an issue when switching to the 32B model. Have you experienced any problems running the 32B model?
hey
@RachelZhou
, I don't have enough compute to test that :/
But if I do I'll report back to you! Hope that tips I gave here, may help you out on your project
hey @RachelZhou , I did try it, and got some buggy results too. Don't have the Traceback unfortunately. But 72B model runs super smoothly! hope it helps!
I could run the original code once I ensured flash-attn was successfully installed!
Thank you for sharing your experience!!