add table
Browse files- evaluation/intro.txt +3 -3
evaluation/intro.txt
CHANGED
@@ -9,11 +9,11 @@ For most models, we sample 200 candidate program completions, and compute pass@1
|
|
9 |
|InCoder 🦜 (6.7B) | 15.2% | 27.8% | 47.00% |
|
10 |
|||||
|
11 |
|Codex (25M)| 3.21% | 7.1% | 12.89%|
|
12 |
-
|Codex (85M)| 8.22% | 12.81% | 22.40% |
|
13 |
|Codex (300M)| 13.17%| 20.37% | 36.27% |
|
14 |
|Codex (12B)| 28.81%| 46.81% | 72.31% |
|
15 |
|||||
|
16 |
|GPT-neo (125M)| 0.75% | 1.88% | 2.97% |
|
17 |
|GPT-neo (1.5B)| 4.79% | 7.47% | 16.30% |
|
18 |
-
|GPT-
|
19 |
-
|
|
|
|
9 |
|InCoder 🦜 (6.7B) | 15.2% | 27.8% | 47.00% |
|
10 |
|||||
|
11 |
|Codex (25M)| 3.21% | 7.1% | 12.89%|
|
|
|
12 |
|Codex (300M)| 13.17%| 20.37% | 36.27% |
|
13 |
|Codex (12B)| 28.81%| 46.81% | 72.31% |
|
14 |
|||||
|
15 |
|GPT-neo (125M)| 0.75% | 1.88% | 2.97% |
|
16 |
|GPT-neo (1.5B)| 4.79% | 7.47% | 16.30% |
|
17 |
+
|GPT-J (6B)| 11.62% | 15.74% | 27.74% |
|
18 |
+
|
19 |
+
To better understand how pass@k metric works, we will illustrate it with some examples. We select 4 tasks from the HumanEval dataset and see how the models performs and which code completions pass the unit tests.
|