• Deploying Gradio: Going to MoE-LLaVA serving tutorial and follow, or quickly, can use ๐Ÿ‘‡๐Ÿ‘‡๐Ÿ‘‡
deepspeed --include localhost:0 moellava/serve/gradio_web_server.py \
    --model-path="tuanio/moe-llava-qwen1.5-0.5b-vista_reason_conv-1ep"

Or CLI:

deepspeed --include localhost:0 moellava/serve/cli.py \
    --model-path "tuanio/moe-llava-qwen1.5-0.5b-vista_reason_conv-1ep" \
    --image-file "data/llm_data/coco2017/train2017/000000391895.jpg"

  • Training script:
moe_mode="sparse"
num_experts=4
top_k_experts=2
use_residual=False
router_aux_loss_coef=0.01

ROOT_DATA=data/llm_data

WANDB_PROJECT=chart-vision-llm CUDA_VISIBLE_DEVICES=0,1,2,3,4 deepspeed --include localhost:2,3,4 moellava/train/train_mem.py \
    --moe_enable True --num_experts ${num_experts} --top_k_experts ${top_k_experts} --capacity_factor 1.5 \
    --moe_mode ${moe_mode} --use_residual ${use_residual} --router_aux_loss_coef ${router_aux_loss_coef} \
    --train_modules mlp.gate_proj mlp.up_proj mlp.down_proj wg \
    --deepspeed ./scripts/zero2.json \
    --model_name_or_path ./checkpoints/ft-llava-qwen1.5-0.5b-vista_llava-merged-2ep \
    --version qwen \
    --data_path $ROOT_DATA/json_files/vista_reason_conversation.json \
    --image_folder $ROOT_DATA/coco2017/train2017 \
    --image_tower google/siglip-base-patch16-256-multilingual \
    --image_projector_type mlp2x_gelu \
    --mm_vision_select_layer -2 \
    --mm_use_im_start_end False \
    --mm_use_im_patch_token False \
    --image_aspect_ratio pad \
    --group_by_modality_length True \
    --bf16 True \
    --output_dir ./checkpoints/ft-moe-llava-qwen1.5-0.5b-vista_reason_conv-1ep \
    --num_train_epochs 1 \
    --per_device_train_batch_size 8 \
    --per_device_eval_batch_size 4 \
    --gradient_accumulation_steps 2 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 24000 \
    --save_total_limit 1 \
    --learning_rate 2e-5 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 50 \
    --tf32 True \
    --model_max_length 2048 \
    --gradient_checkpointing True \
    --dataloader_num_workers 8 \
    --lazy_preprocess True \
    --report_to wandb \
    --cache_dir "./cache_dir" \
    --run_name ft-moe-llava-qwen1.5-0.5b-vista_reason_conv-1ep 
Downloads last month
9
Safetensors
Model size
1.03B params
Tensor type
BF16
ยท
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Dataset used to train tuanio/moe-llava-qwen1.5-0.5b-vista_reason_conv-1ep