metadata
license: apache-2.0
datasets:
- tabtoyou/KoLLaVA-Instruct-150k
- tabtoyou/KoLLaVA-CC3M-Pretrain-595K
language:
- ko
library_name: transformers
tags:
- LLaVA
- KoVicuna
- KoLLaVA
- KoAlpaca
KoLLaVA : Korean Large Language and Vision Assistant (feat. LLaVA)
This model is a large multimodal model (LMM) that combines the visual encoder of CLIP(ViT-14) with LLM KoVicuna, trained on Korean visual-instruction dataset.
Detail codes are available at KoLLaVA github repository
Training hyperparameters
- learning rate : 2e-5
- train_batch_size: 16
- distributed_type: multi-GPU (A100 80G)
- num_devices: 4
- gradient_accumulation_steps: 1
- total_train_batch_size: 64
- total_eval_batch_size: 16
- lr_scheduler_type: cosine
- num_epochs: 1