Update README.md
Browse files
README.md
CHANGED
@@ -42,20 +42,32 @@ DNA 1.0 8B Instruct was fine-tuned on approximately 10B tokens of carefully cura
|
|
42 |
|
43 |
We evaluated DNA 1.0 8B Instruct against other prominent language models of similar size across various benchmarks, including Korean-specific tasks and general language understanding metrics. More details will be provided in the upcoming <u>Technical Report</u>.
|
44 |
|
45 |
-
| Language | Benchmark | **dnotitia/DNA-1.0-8B-Instruct** | LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct | yanolja/EEVE-Korean-Instruct-10.8B-v1.0 | meta-llama/Llama-3.1-8B-Instruct | mistralai/Mistral-7B-Instruct-v0.3 | NCSOFT/Llama-VARCO-8B-Instruct | upstage/SOLAR-10.7B-Instruct-v1.0 |
|
46 |
-
|
47 |
-
| Korean | KMMLU | **53.26** (1st) |
|
48 |
-
| | KMMLU-hard | **29.46** (1st) | 20.78 | 19.25 | 20.49 | 17.86 | 19.83 | 20.61 |
|
49 |
-
| | KoBEST | **83.40** (1st) | 80.13 | <u>81.67</u> | 67.56 | 63.77 | 72.99 | 73.26 |
|
50 |
-
| | Belebele | **57.99** (1st) | 45.11 | 49.40 |
|
51 |
-
| | CSATQA | **43.32** (1st) | 34.76 |
|
52 |
-
| English | MMLU |
|
53 |
-
| | MMLU-Pro | **43.05** (1st) | 38.90 | 32.79 | <u>
|
54 |
-
| | GSM8K | **80.52** (1st) | <u>80.06</u> | 56.18 | 75.82 | 49.66 | 64.14 | 69.22 |
|
55 |
|
56 |
- The *highest* *scores* are in **bold** form, and the *second*\-*highest* *scores* are <u>underlined</u>.
|
57 |
-
- These results were obtained using a 5-shot evaluation setting.
|
58 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
59 |
|
60 |
## Quickstart
|
61 |
|
|
|
42 |
|
43 |
We evaluated DNA 1.0 8B Instruct against other prominent language models of similar size across various benchmarks, including Korean-specific tasks and general language understanding metrics. More details will be provided in the upcoming <u>Technical Report</u>.
|
44 |
|
45 |
+
| Language | Benchmark | **dnotitia/DNA-1.0-8B-Instruct** | LGAI-EXAONE/EXAONE-3.5-7.8B-Instruct | LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct | yanolja/EEVE-Korean-Instruct-10.8B-v1.0 | Qwen/Qwen2.5-7B-Instruct | meta-llama/Llama-3.1-8B-Instruct | mistralai/Mistral-7B-Instruct-v0.3 | NCSOFT/Llama-VARCO-8B-Instruct | upstage/SOLAR-10.7B-Instruct-v1.0 |
|
46 |
+
|----------|------------|----------------------------------|--------------------------------------|--------------------------------------|-----------------------------------------|--------------------------|----------------------------------|------------------------------------|--------------------------------|-----------------------------------|
|
47 |
+
| Korean | KMMLU | **53.26** (1st) | 45.30 | 45.28 | 42.17 | 45.66 | 41.66 | 31.45 | 38.49 | 41.50 |
|
48 |
+
| | KMMLU-hard | **29.46** (1st) | 23.17 | 20.78 | 19.25 | 24.78 | 20.49 | 17.86 | 19.83 | 20.61 |
|
49 |
+
| | KoBEST | **83.40** (1st) | 79.05 | 80.13 | <u>81.67</u> | 78.51 | 67.56 | 63.77 | 72.99 | 73.26 |
|
50 |
+
| | Belebele | **57.99** (1st) | | 45.11 | 49.40 | 54.85 | 54.70 | 40.31 | 53.17 | 48.68 |
|
51 |
+
| | CSATQA | **43.32** (1st) | 40.11 | 34.76 | 39.57 | 45.45 | 36.90 | 27.27 | 32.62 | 34.22 |
|
52 |
+
| English | MMLU | 66.59 (3rd) | 65.27 | 64.32 | 63.63 | **74.26** | <u>68.26</u> | 62.04 | 63.25 | 65.30 |
|
53 |
+
| | MMLU-Pro | **43.05** (1st) | | 38.90 | 32.79 | <u>42.5</u> | 40.92 | 33.49 | 37.11 | 30.25 |
|
54 |
+
| | GSM8K | **80.52** (1st) | 65.96 | <u>80.06</u> | 56.18 | 75.74 | 75.82 | 49.66 | 64.14 | 69.22 |
|
55 |
|
56 |
- The *highest* *scores* are in **bold** form, and the *second*\-*highest* *scores* are <u>underlined</u>.
|
|
|
57 |
|
58 |
+
**Evaluation Protocol**
|
59 |
+
For easy reproduction of our evaluation results, we list the evaluation tools and settings used below:
|
60 |
+
|
61 |
+
| | Evaluation setting | Metric | Evaluation tool |
|
62 |
+
|------------|--------------------|-------------------------------------|-----------------|
|
63 |
+
| KMMLU | 5-shot | mean / exact\_match | lm-eval-harness |
|
64 |
+
| KMMLU Hard | 5-shot | mean / exact\_match | lm-eval-harness |
|
65 |
+
| KoBEST | 5-shot | macro\_avg / f1 | lm-eval-harness |
|
66 |
+
| Belebele | 0-shot | mean / acc | lm-eval-harness |
|
67 |
+
| CSATQA | 0-shot | mean / acc\_norm | lm-eval-harness |
|
68 |
+
| MMLU | 5-shot | mean / acc | lm-eval-harness |
|
69 |
+
| MMLU Pro | 5-shot | mean / exact\_match | lm-eval-harness |
|
70 |
+
| GSM8K | 5-shot | acc, exact\_match & strict\_extract | lm-eval-harness |
|
71 |
|
72 |
## Quickstart
|
73 |
|