Update README.md
#3
by
kushal-tri
- opened
README.md
CHANGED
@@ -62,26 +62,23 @@ Here are the evaluation results for DCLM-1B models on various tasks (using [llm-
|
|
62 |
|
63 |
Note: All scores are presented as decimal values between 0 and 1, representing the proportion of correct answers or the model's performance on each task.
|
64 |
|
65 |
-
Moreover, we present our evaluation results on Length-Controlled Alpaca-Eval 2.0 to measure our instruction-following capabilities.
|
66 |
|
67 |
| Model | AlpacaEval2.0 LC Win-rate (%) |
|
68 |
|------------------------------------|------------------------------:|
|
69 |
| **Our runs** | |
|
70 |
-
| DCLM-IT-1B | 8.6
|
71 |
| DCLM-IT-7B | 16.6 |
|
72 |
-
|
|
73 |
-
| DCLM-Baseline-7B w/ OpenHermes 2.5 | 13.8 |
|
74 |
-
| **Reported from the leaderboard** | |
|
75 |
-
| LLaMA-3-Instruct-8B | **22.9** |
|
76 |
-
| Mistral-v0.2-7B | 17.1 |
|
77 |
-
| Mistral-7B w/ OpenHermes 2.5 | 16.2 |
|
78 |
-
| Zephyr-Beta-7B | 13.2 |
|
79 |
-
| Vicuna-v1.3-13B | 10.8 |
|
80 |
| Gemma-Instruct-7B | 10.4 |
|
81 |
| Nous-Hermes-13B | 9.7 |
|
82 |
| DaVinci001 | 9.0 |
|
83 |
| LLaMA-2-Chat-13B | 8.4 |
|
84 |
| Alpaca-7B | 5.9 |
|
|
|
|
|
|
|
|
|
85 |
|
86 |
## Example Code
|
87 |
|
|
|
62 |
|
63 |
Note: All scores are presented as decimal values between 0 and 1, representing the proportion of correct answers or the model's performance on each task.
|
64 |
|
65 |
+
Moreover, we present our evaluation results on Length-Controlled Alpaca-Eval 2.0 to measure our instruction-following capabilities.
|
66 |
|
67 |
| Model | AlpacaEval2.0 LC Win-rate (%) |
|
68 |
|------------------------------------|------------------------------:|
|
69 |
| **Our runs** | |
|
70 |
+
| DCLM-IT-1B | **8.6** |
|
71 |
| DCLM-IT-7B | 16.6 |
|
72 |
+
| **Reported from the leaderboard** | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
73 |
| Gemma-Instruct-7B | 10.4 |
|
74 |
| Nous-Hermes-13B | 9.7 |
|
75 |
| DaVinci001 | 9.0 |
|
76 |
| LLaMA-2-Chat-13B | 8.4 |
|
77 |
| Alpaca-7B | 5.9 |
|
78 |
+
| Gemma-Instruct-2B | 5.4 |
|
79 |
+
| Phi-2 SFT | 5.9 |
|
80 |
+
| Qwen1.5 1.8B Chat | 2.6 |
|
81 |
+
|--------------------------------------------------------------------|
|
82 |
|
83 |
## Example Code
|
84 |
|