This model is an improved version for Korean, based on the Qwen2-72B-Instruct model.
LogicKor Benchmark (24.07.31)
- The following benchmark ranks are based on 1-shot evaluation.
Rank Model Reasoning Math Writing Coding Understanding Grammar Singleturn Multiturn Total Parameters 1 openai/gpt-4o-2024-05-13 9.21 8.71 9.64 9.78 9.64 9.50 9.33 9.50 9.41 ? 2 anthropic/claude-3-5-sonnet-20240620 8.64 8.42 9.85 9.78 9.92 9.21 9.26 9.35 9.30 ? 4 mistralai/Mistral-Large-Instruct-2407 9.71 9.07 9.57 9.92 9.92 6.78 9.19 9.14 9.16 123B 8 meta-llama/Meta-Llama-3.1-405B-Instruct-FP8 8.78 7.14 9.28 9.64 9.64 8.57 8.97 8.71 8.84 405B 9 denial07/Qwen2-72B-Instruct-kor-dpo
8.85 8.21 9.14 9.71 9.64 7.21 8.88 8.71 8.79 72B 10 Qwen/Qwen2-72B-Instruct 8.00 8.14 9.07 9.85 9.78 7.28 8.61 8.76 8.69 72B 11 google/gemini-1.5-pro-001 7.00 8.00 9.57 8.85 9.35 8.64 8.61 8.52 8.57 ?
KMMLU Benchmark
- HAERAE-HUB/KMMLU benchmark accuracy score.
Category Qwen2-72B-it-kor-dpo Qwen2-72B-it Mistral-Large-it-2407 Questions HUMSS 0.63 0.63 0.62 5130 STEM 0.59 0.59 0.57 9900 Applied Science 0.56 0.56 0.54 11600 Other 0.58 0.58 0.54 8400 Overall Accuracy 0.58 0.58 0.56 35030
- Downloads last month
- 346
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.