--- license: cc-by-nc-4.0 --- ## KoLLaVA : Korean Large Language and Vision Assistant (feat. LLaVA) This model is a large multimodal model (LMM) that combines the LLM(LLaMA-2-7b-ko) with visual encoder of CLIP(ViT-14), trained on Korean visual-instruction dataset using QLoRA. Detail codes are available at [KoLLaVA](https://github.com/tabtoyou/KoLLaVA/tree/main) github repository - Training hyperparameters - learning rate : 2e-4 - train_batch_size: 16 - distributed_type: multi-GPU (RTX3090 24G) - num_devices: 4 - gradient_accumulation_steps: 2 - total_train_batch_size: 128 - total_eval_batch_size: 4 - lr_scheduler_type: cosine - num_epochs: 1 - lora_enable: True - bits: 4 Model License: cc-by-nc-4.0