evaluation/eval_table.md · codeparrot/code-generation-models at main

Table 1 below shows the HumanEval scores of CodeParrot, InCoder, PolyCoder, CodeGen and Codex (not open-source).

Model	pass@1	pass@10	pass@100
CodeParrot (110M)	3.80%	6.57%	12.78%
CodeParrot (1.5B)	3.58%	8.03%	14.96%
CodeParrot (1.5B)	3.99%	8.69%	17.88%

InCoder (6.7B)	15.2%	27.8%	47.00%

PolyCoder (160M)	2.13%	3.35%	4.88%
PolyCoder (400M)	2.96%	5.29%	11.59%
PolyCoder (2.7B)	5.59%	9.84%	17.68%

CodeGen-Mono (350M)	12.76%	23.11%	35.19%
CodeGen-Mono (2.7B)	23.70%	36.64%	57.01%
CodeGen-Mono (6.1B)	26.13%	42.29%	65.82%
CodeGen-Mono (16.1B)	29.28%	49.86%	75.00%

Codex (25M)	3.21%	7.1%	12.89%
Codex (300M)	13.17%	20.37%	36.27%
Codex (12B)	28.81%	46.81%	72.31%