EleutherAI
/

polyglot-ko-12.8b

@@ -73,14 +73,16 @@ model = AutoModelForCausalLM.from_pretrained("EleutherAI/polyglot-ko-12.8b")
 ## Evaluation results
-We evaluate Polyglot-Ko-12.8B on [KOBEST dataset](https://arxiv.org/abs/2204.04541), a benchmark with 5 downstream tasks, against comparable models such as skt/ko-gpt-trinity-1.2B-v0.5, kakaobrain/kogpt and facebook/xglm-7.5B, using the prompts provided in the paper.
 The following tables show the results when the number of few-shot examples differ. You can reproduce these results using the [polyglot branch of lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness/tree/polyglot) and the following scripts. For a fair comparison, all models were run under the same conditions and using the same prompts. In the tables, `n` refers to the number of few-shot examples.
 ```console
 python main.py \
    --model gpt2 \
-   --model_args pretrained='EleutherAI/polyglot-ko-12.8b' \
    --tasks kobest_copa,kobest_hellaswag \
    --num_fewshot $YOUR_NUM_FEWSHOT \
    --batch_size $YOUR_BATCH_SIZE \
@@ -90,7 +92,7 @@ python main.py \
 ### COPA (F1)
-| Model                                                                                        | params | n=0 | n=5 | n=10 | n=50 |
 |----------------------------------------------------------------------------------------------|--------|--------|--------|---------|---------|
 | [skt/ko-gpt-trinity-1.2B-v0.5](https://huggingface.co/skt/ko-gpt-trinity-1.2B-v0.5)          | 1.2B   | 0.6696 | 0.6477 | 0.6419  | 0.6514  |
 | [kakaobrain/kogpt](https://huggingface.co/kakaobrain/kogpt)                                  | 6.0B   | 0.7345 | 0.7287 | 0.7277  | 0.7479  |
@@ -98,23 +100,65 @@ python main.py \
 | [EleutherAI/polyglot-ko-1.3b](https://huggingface.co/EleutherAI/polyglot-ko-1.3b)            | 1.3B   | 0.7196 | 0.7193 | 0.7204  | 0.7206  |
 | [EleutherAI/polyglot-ko-3.8b](https://huggingface.co/EleutherAI/polyglot-ko-3.8b)            | 3.8B   | 0.7595 | 0.7608 | 0.7638  | 0.7788  |
 | [EleutherAI/polyglot-ko-5.8b](https://huggingface.co/EleutherAI/polyglot-ko-5.8b)            | 5.8B   | 0.7745 | 0.7676 | 0.7775  | 0.7887  |
-| **[EleutherAI/polyglot-ko-12.8b](https://huggingface.co/EleutherAI/polyglot-ko-12.8b) (this)** |**12.8B**|**0.7937**|**0.8108**|**0.8037**|**0.8368**|
-<img src="https://user-images.githubusercontent.com/19511788/233820235-6f617932-3b18-4534-be14-8df9e80b8a06.jpg" width="1000px">
 ### HellaSwag (F1)
-| Model                                                                                          | params |n=0 | n=5 | n=10 | n=50 |
-|------------------------------------------------------------------------------------------------|--------|--------|--------|---------|---------|
-| [skt/ko-gpt-trinity-1.2B-v0.5](https://huggingface.co/skt/ko-gpt-trinity-1.2B-v0.5)            | 1.2B   | 0.5243 | 0.5272 | 0.5166  | 0.5352  |
-| [kakaobrain/kogpt](https://huggingface.co/kakaobrain/kogpt)                                    | 6.0B   | 0.5590 | 0.5833 | 0.5828  | 0.5907  |
-| [facebook/xglm-7.5B](https://huggingface.co/facebook/xglm-7.5B)                                | 7.5B   | 0.5665 | 0.5689 | 0.5565  | 0.5622  |
-| [EleutherAI/polyglot-ko-1.3b](https://huggingface.co/EleutherAI/polyglot-ko-1.3b)              | 1.3B   | 0.5247 | 0.5260 | 0.5278  | 0.5427  |
-| [EleutherAI/polyglot-ko-3.8b](https://huggingface.co/EleutherAI/polyglot-ko-3.8b)              | 3.8B   | 0.5707 | 0.5830 | 0.5670  | 0.5787  |
-| [EleutherAI/polyglot-ko-5.8b](https://huggingface.co/EleutherAI/polyglot-ko-5.8b)              | 5.8B   | 0.5976 | 0.5998 | 0.5979  | 0.6208  |
-| **[EleutherAI/polyglot-ko-12.8b](https://huggingface.co/EleutherAI/polyglot-ko-12.8b) (this)** | **12.8B**  | **0.5954** | **0.6306** | **0.6098**  | **0.6118**  |
-<img src="https://user-images.githubusercontent.com/19511788/233820233-0127983e-4b37-48ce-89e5-51509ed9b1f2.jpg" width="1000px">
 ## Limitations and Biases

 ## Evaluation results
+We evaluate Polyglot-Ko-3.8B on [KOBEST dataset](https://arxiv.org/abs/2204.04541), a benchmark with 5 downstream tasks, against comparable models such as skt/ko-gpt-trinity-1.2B-v0.5, kakaobrain/kogpt and facebook/xglm-7.5B, using the prompts provided in the paper.
 The following tables show the results when the number of few-shot examples differ. You can reproduce these results using the [polyglot branch of lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness/tree/polyglot) and the following scripts. For a fair comparison, all models were run under the same conditions and using the same prompts. In the tables, `n` refers to the number of few-shot examples.
+In case of WiC dataset, all models show random performance.
 ```console
 python main.py \
    --model gpt2 \
+   --model_args pretrained='EleutherAI/polyglot-ko-3.8b' \
    --tasks kobest_copa,kobest_hellaswag \
    --num_fewshot $YOUR_NUM_FEWSHOT \
    --batch_size $YOUR_BATCH_SIZE \
 ### COPA (F1)
+| Model                                                                                        | params | n=0    | n=5    | n=10    | n=50    |
 |----------------------------------------------------------------------------------------------|--------|--------|--------|---------|---------|
 | [skt/ko-gpt-trinity-1.2B-v0.5](https://huggingface.co/skt/ko-gpt-trinity-1.2B-v0.5)          | 1.2B   | 0.6696 | 0.6477 | 0.6419  | 0.6514  |
 | [kakaobrain/kogpt](https://huggingface.co/kakaobrain/kogpt)                                  | 6.0B   | 0.7345 | 0.7287 | 0.7277  | 0.7479  |
 | [EleutherAI/polyglot-ko-1.3b](https://huggingface.co/EleutherAI/polyglot-ko-1.3b)            | 1.3B   | 0.7196 | 0.7193 | 0.7204  | 0.7206  |
 | [EleutherAI/polyglot-ko-3.8b](https://huggingface.co/EleutherAI/polyglot-ko-3.8b)            | 3.8B   | 0.7595 | 0.7608 | 0.7638  | 0.7788  |
 | [EleutherAI/polyglot-ko-5.8b](https://huggingface.co/EleutherAI/polyglot-ko-5.8b)            | 5.8B   | 0.7745 | 0.7676 | 0.7775  | 0.7887  |
+| **[EleutherAI/polyglot-ko-12.8b](https://huggingface.co/EleutherAI/polyglot-ko-12.8b) (this)** | **12.8B** | **0.7937** | **0.8108** | **0.8037** | **0.8369** |
+<img src="https://github.com/EleutherAI/polyglot/assets/19511788/d5b49364-aed5-4467-bae2-5a322c8e2ceb" width="800px">
 ### HellaSwag (F1)
+| Model                                                                                        | params | n=0    | n=5    | n=10    | n=50    |
+|----------------------------------------------------------------------------------------------|--------|--------|--------|---------|---------|
+| [skt/ko-gpt-trinity-1.2B-v0.5](https://huggingface.co/skt/ko-gpt-trinity-1.2B-v0.5)          | 1.2B   | 0.5243 | 0.5272 | 0.5166  | 0.5352  |
+| [kakaobrain/kogpt](https://huggingface.co/kakaobrain/kogpt)                                  | 6.0B   | 0.5590 | 0.5833 | 0.5828  | 0.5907  |
+| [facebook/xglm-7.5B](https://huggingface.co/facebook/xglm-7.5B)                              | 7.5B   | 0.5665 | 0.5689 | 0.5565  | 0.5622  |
+| [EleutherAI/polyglot-ko-1.3b](https://huggingface.co/EleutherAI/polyglot-ko-1.3b)            | 1.3B   | 0.5247 | 0.5260 | 0.5278  | 0.5427  |
+| [EleutherAI/polyglot-ko-3.8b](https://huggingface.co/EleutherAI/polyglot-ko-3.8b)            | 3.8B   | 0.5707 | 0.5830 | 0.5670  | 0.5787  |
+| [EleutherAI/polyglot-ko-5.8b](https://huggingface.co/EleutherAI/polyglot-ko-5.8b)            | 5.8B   | 0.5976 | 0.5998 | 0.5979  | 0.6208  |
+| **[EleutherAI/polyglot-ko-12.8b (this)](https://huggingface.co/EleutherAI/polyglot-ko-12.8b)** | **12.8B** | **0.5954** | **0.6306** | **0.6098** | **0.6118** |
+<img src="https://github.com/EleutherAI/polyglot/assets/19511788/5acb60ac-161a-4ab3-a296-db4442e08b7f" width="800px">
+### BoolQ (F1)
+| Model                                                                                        | params | n=0    | n=5    | n=10    | n=50    |
+|----------------------------------------------------------------------------------------------|--------|--------|--------|---------|---------|
+| [skt/ko-gpt-trinity-1.2B-v0.5](https://huggingface.co/skt/ko-gpt-trinity-1.2B-v0.5)          | 1.2B   | 0.3356 | 0.4014 | 0.3640  | 0.3560  |
+| [kakaobrain/kogpt](https://huggingface.co/kakaobrain/kogpt)                                  | 6.0B   | 0.4514 | 0.5981 | 0.5499  | 0.5202  |
+| [facebook/xglm-7.5B](https://huggingface.co/facebook/xglm-7.5B)                              | 7.5B   | 0.4464 | 0.3324 | 0.3324  | 0.3324  |
+| [EleutherAI/polyglot-ko-1.3b](https://huggingface.co/EleutherAI/polyglot-ko-1.3b)            | 1.3B   | 0.3552 | 0.4751 | 0.4109  | 0.4038  |
+| [EleutherAI/polyglot-ko-3.8b](https://huggingface.co/EleutherAI/polyglot-ko-3.8b)            | 3.8B   | 0.4320 | 0.5263 | 0.4930  | 0.4038  |
+| [EleutherAI/polyglot-ko-5.8b](https://huggingface.co/EleutherAI/polyglot-ko-5.8b)            | 5.8B   | 0.4356 | 0.5698 | 0.5187  | 0.5236  |
+| **[EleutherAI/polyglot-ko-12.8b (this)](https://huggingface.co/EleutherAI/polyglot-ko-12.8b)** | **12.8B** | **0.4818** | **0.6041** | **0.6289** | **0.6448** |
+<img src="https://github.com/EleutherAI/polyglot/assets/19511788/b74c23c0-01f3-4b68-9e10-a48e9aa052ab" width="800px">
+### SentiNeg (F1)
+| Model                                                                                        | params | n=0    | n=5    | n=10    | n=50    |
+|----------------------------------------------------------------------------------------------|--------|--------|--------|---------|---------|
+| [skt/ko-gpt-trinity-1.2B-v0.5](https://huggingface.co/skt/ko-gpt-trinity-1.2B-v0.5)          | 1.2B   | 0.6065 | 0.6878 | 0.7280  | 0.8413  |
+| [kakaobrain/kogpt](https://huggingface.co/kakaobrain/kogpt)                                  | 6.0B   | 0.3747 | 0.8942 | 0.9294  | 0.9698  |
+| [facebook/xglm-7.5B](https://huggingface.co/facebook/xglm-7.5B)                              | 7.5B   | 0.3578 | 0.4471 | 0.3964  | 0.5271  |
+| [EleutherAI/polyglot-ko-1.3b](https://huggingface.co/EleutherAI/polyglot-ko-1.3b)            | 1.3B   | 0.6790 | 0.6257 | 0.5514  | 0.7851  |
+| [EleutherAI/polyglot-ko-3.8b](https://huggingface.co/EleutherAI/polyglot-ko-3.8b)            | 3.8B   | 0.4858 | 0.7950 | 0.7320  | 0.7851  |
+| [EleutherAI/polyglot-ko-5.8b](https://huggingface.co/EleutherAI/polyglot-ko-5.8b)            | 5.8B   | 0.3394 | 0.8841 | 0.8808  | 0.9521  |
+| **[EleutherAI/polyglot-ko-12.8b (this)](https://huggingface.co/EleutherAI/polyglot-ko-12.8b)** | **12.8B** | **0.9117** | **0.9015** | **0.9345** | **0.9723** |
+<img src="https://github.com/EleutherAI/polyglot/assets/19511788/95b56b19-d349-4b70-9ff9-94a5560f89ee" width="800px">
+### WiC (F1)
+| Model                                                                                        | params | n=0    | n=5    | n=10    | n=50    |
+|----------------------------------------------------------------------------------------------|--------|--------|--------|---------|---------|
+| [skt/ko-gpt-trinity-1.2B-v0.5](https://huggingface.co/skt/ko-gpt-trinity-1.2B-v0.5)          | 1.2B   | 0.3290 | 0.4313 | 0.4001  | 0.3621  |
+| [kakaobrain/kogpt](https://huggingface.co/kakaobrain/kogpt)                                  | 6.0B   | 0.3526 | 0.4775 | 0.4358  | 0.4061  |
+| [facebook/xglm-7.5B](https://huggingface.co/facebook/xglm-7.5B)                              | 7.5B   | 0.3280 | 0.4903 | 0.4945  | 0.3656  |
+| [EleutherAI/polyglot-ko-1.3b](https://huggingface.co/EleutherAI/polyglot-ko-1.3b)            | 1.3B   | 0.3297 | 0.4850 | 0.4650  | 0.3290  |
+| [EleutherAI/polyglot-ko-3.8b](https://huggingface.co/EleutherAI/polyglot-ko-3.8b)            | 3.8B   | 0.3390 | 0.4944 | 0.4203  | 0.3835  |
+| [EleutherAI/polyglot-ko-5.8b](https://huggingface.co/EleutherAI/polyglot-ko-5.8b)            | 5.8B   | 0.3913 | 0.4688 | 0.4189  | 0.3910  |
+| **[EleutherAI/polyglot-ko-12.8b](https://huggingface.co/EleutherAI/polyglot-ko-12.8b) (this)** | **12.8B** | **0.3985** | **0.3683** | **0.3307** | **0.3273** |
+<img src="https://github.com/EleutherAI/polyglot/assets/19511788/4de4a4c3-d7ac-4e04-8b0c-0d533fe88294" width="800px">
 ## Limitations and Biases