Update README.md
Browse files
README.md
CHANGED
@@ -141,16 +141,20 @@ Results show that EuroLLM-1.7B is superior to TinyLlama-v1.1 and similar to Gemm
|
|
141 |
#### Arc Challenge
|
142 |
| Model | Average | English | German | Spanish | French | Italian | Portuguese | Chinese | Russian | Dutch | Arabic | Swedish | Hindi | Hungarian | Romanian | Ukrainian | Danish | Catalan |
|
143 |
|--------------------|---------|---------|--------|---------|--------|---------|------------|---------|---------|-------|--------|---------|--------|-----------|----------|-----------|--------|---------|
|
144 |
-
| EuroLLM-1.7B | 0.
|
145 |
| TinyLlama-v1.1 | 0.2650 | 0.3712 | 0.2524 | 0.2795 | 0.2883 | 0.2652 | 0.2906 | 0.2410 | 0.2669 | 0.2404 | 0.2310 | 0.2687 | 0.2354 | 0.2449 | 0.2476 | 0.2524 | 0.2494 | 0.2796 |
|
146 |
| Gemma-2B | 0.3617 | 0.4846 | 0.3755 | 0.3940 | 0.4080 | 0.3687 | 0.3872 | 0.3726 | 0.3456 | 0.3328 | 0.3122 | 0.3519 | 0.2851 | 0.3039 | 0.3590 | 0.3601 | 0.3565 | 0.3516 |
|
|
|
|
|
147 |
#### Hellaswag
|
148 |
| Model | Average | English | German | Spanish | French | Italian | Portuguese | Russian | Dutch | Arabic | Swedish | Hindi | Hungarian | Romanian | Ukrainian | Danish | Catalan |
|
149 |
|--------------------|---------|---------|--------|---------|--------|---------|------------|---------|--------|--------|---------|--------|-----------|----------|-----------|--------|---------|
|
150 |
-
| EuroLLM-1.7B | 0.4744 | 0.
|
151 |
| TinyLlama-v1.1 |0.3674 | 0.6248 | 0.3650 | 0.4137 | 0.4010 | 0.3780 | 0.3892 | 0.3494 | 0.3588 | 0.2880 | 0.3561 | 0.2841 | 0.3073 | 0.3267 | 0.3349 | 0.3408 | 0.3613 |
|
152 |
| Gemma-2B |0.4666 | 0.7165 | 0.4756 | 0.5414 | 0.5180 | 0.4841 | 0.5081 | 0.4664 | 0.4655 | 0.3868 | 0.4383 | 0.3413 | 0.3710 | 0.4316 | 0.4291 | 0.4471 | 0.4448 |
|
153 |
|
|
|
|
|
154 |
## Bias, Risks, and Limitations
|
155 |
|
156 |
EuroLLM-1.7B has not been aligned to human preferences, so the model may generate problematic outputs (e.g., hallucinations, harmful content, or false statements).
|
|
|
141 |
#### Arc Challenge
|
142 |
| Model | Average | English | German | Spanish | French | Italian | Portuguese | Chinese | Russian | Dutch | Arabic | Swedish | Hindi | Hungarian | Romanian | Ukrainian | Danish | Catalan |
|
143 |
|--------------------|---------|---------|--------|---------|--------|---------|------------|---------|---------|-------|--------|---------|--------|-----------|----------|-----------|--------|---------|
|
144 |
+
| EuroLLM-1.7B | 0.3496 | 0.4061 | 0.3464 | 0.3684 | 0.3627 | 0.3738 | 0.3855 | 0.3521 | 0.3208 | 0.3507 | 0.3045 | 0.3605 | 0.2928 | 0.3271 | 0.3488 | 0.3516 | 0.3513 | 0.3396 |
|
145 |
| TinyLlama-v1.1 | 0.2650 | 0.3712 | 0.2524 | 0.2795 | 0.2883 | 0.2652 | 0.2906 | 0.2410 | 0.2669 | 0.2404 | 0.2310 | 0.2687 | 0.2354 | 0.2449 | 0.2476 | 0.2524 | 0.2494 | 0.2796 |
|
146 |
| Gemma-2B | 0.3617 | 0.4846 | 0.3755 | 0.3940 | 0.4080 | 0.3687 | 0.3872 | 0.3726 | 0.3456 | 0.3328 | 0.3122 | 0.3519 | 0.2851 | 0.3039 | 0.3590 | 0.3601 | 0.3565 | 0.3516 |
|
147 |
+
|
148 |
+
|
149 |
#### Hellaswag
|
150 |
| Model | Average | English | German | Spanish | French | Italian | Portuguese | Russian | Dutch | Arabic | Swedish | Hindi | Hungarian | Romanian | Ukrainian | Danish | Catalan |
|
151 |
|--------------------|---------|---------|--------|---------|--------|---------|------------|---------|--------|--------|---------|--------|-----------|----------|-----------|--------|---------|
|
152 |
+
| EuroLLM-1.7B | 0.4744 | 0.4760 | 0.6057 | 0.4793 | 0.5337 | 0.5298 | 0.5085 | 0.5224 | 0.4654 | 0.4949 | 0.4104 | 0.4800 | 0.3655 | 0.4097 | 0.4606 | 0.436 | 0.4702 | 0.4445 |
|
153 |
| TinyLlama-v1.1 |0.3674 | 0.6248 | 0.3650 | 0.4137 | 0.4010 | 0.3780 | 0.3892 | 0.3494 | 0.3588 | 0.2880 | 0.3561 | 0.2841 | 0.3073 | 0.3267 | 0.3349 | 0.3408 | 0.3613 |
|
154 |
| Gemma-2B |0.4666 | 0.7165 | 0.4756 | 0.5414 | 0.5180 | 0.4841 | 0.5081 | 0.4664 | 0.4655 | 0.3868 | 0.4383 | 0.3413 | 0.3710 | 0.4316 | 0.4291 | 0.4471 | 0.4448 |
|
155 |
|
156 |
+
|
157 |
+
|
158 |
## Bias, Risks, and Limitations
|
159 |
|
160 |
EuroLLM-1.7B has not been aligned to human preferences, so the model may generate problematic outputs (e.g., hallucinations, harmful content, or false statements).
|