meta-llama/Llama-3.2-11B-Vision-InstructのLinear層(CrossAttention層を除く)をtokyotech-llm/Llama-3-Swallow-8B-Instruct-v0.1のものに置き換えたモデルです。
meta-llama/Llama-3.2-11B-Vision-Instruct
tokyotech-llm/Llama-3-Swallow-8B-Instruct-v0.1