update
Browse files- evaluation/intro.txt +1 -1
evaluation/intro.txt
CHANGED
@@ -56,7 +56,7 @@ Results: {'pass@1': 0.0750, 'pass@10': 0.4473, 'pass@20': 0.5}
|
|
56 |
````
|
57 |
|
58 |
If we take a closer look at the unit test results for each candidate solution in the three tasks, we find that only 3 passed the test for the second problem, and none did for the first problem. This means that we have 3 correct solutions among 40, which corresponds to our pass@1 value `3/40 = 0.075`. The scores pass@10 and pass@20 are higher, because the more samples we select from the candidate completions, the more likely we are to include the correct implementation. As
|
59 |
-
for pass@20, it is `1/2=0.5`, since if we select all 20 candidates for each problem, the second problem get solved which gives 50% success rate. If you are curious about the candidate solutions that passed the tests, they all implemented this function:
|
60 |
|
61 |
```python
|
62 |
|
|
|
56 |
````
|
57 |
|
58 |
If we take a closer look at the unit test results for each candidate solution in the three tasks, we find that only 3 passed the test for the second problem, and none did for the first problem. This means that we have 3 correct solutions among 40, which corresponds to our pass@1 value `3/40 = 0.075`. The scores pass@10 and pass@20 are higher, because the more samples we select from the candidate completions, the more likely we are to include the correct implementation. As
|
59 |
+
for pass@20, it is `1/2 = 0.5`, since if we select all 20 candidates for each problem, the second problem get solved which gives 50% success rate. If you are curious about the candidate solutions that passed the tests, they all implemented this function:
|
60 |
|
61 |
```python
|
62 |
|