Spaces:

codeparrot
/

code-generation-models

Running

loubnabnl HF Staff commited on Jun 6, 2022

Commit

a8caee6

1 Parent(s): 4d8fc44

Update datasets/intro.md

Files changed (1) hide show

datasets/intro.md CHANGED Viewed

@@ -5,4 +5,4 @@ Below is the distribution of the pretraining data size of some code models, we p
     <img src="https://huggingface.co/datasets/loubnabnl/repo-images/resolve/main/data_distrub.png" alt="drawing" width="440"/>
 </p>
-Some other useful datasets that are available on the 🤗 Hub are [CodeSearchNet](https://huggingface.co/datasets/code_search_net), a corpus of 2 milllion (comment, code) pairs from open-source libraries hosted on GitHub for several programming languages, and [Mostly Basic Python Problems (mbpp)](https://huggingface.co/datasets/mbpp), a benchmark of around 1,000 crowd-sourced Python programming problems, for entry level programmers, where each problem consists of a task description, code solution and 3 automated test cases, this dataset was used in [InCoder](https://huggingface.co/facebook/incoder-6B) evaluation in addition to [HumanEval](https://huggingface.co/datasets/openai_humaneval) that we will present later.

     <img src="https://huggingface.co/datasets/loubnabnl/repo-images/resolve/main/data_distrub.png" alt="drawing" width="440"/>
 </p>
+Some other useful datasets that are available on the 🤗 Hub are [CodeSearchNet](https://huggingface.co/datasets/code_search_net), a corpus of 2 milllion (comment, code) pairs from open-source libraries hosted on GitHub for several programming languages, and [Mostly Basic Python Problems (mbpp)](https://huggingface.co/datasets/mbpp), a benchmark of around 1,000 crowd-sourced Python programming problems, for entry level programmers, where each problem consists of a task description, code solution and 3 automated test cases, this dataset was used in [InCoder](https://huggingface.co/facebook/incoder-6B) evaluation in addition to [HumanEval](https://huggingface.co/datasets/openai_humaneval) that we will present later. You can also find [APPS](https://huggingface.co/datasets/loubnabnl/apps), a benchmark with 10000 problems consisting of programming questions in English and code solutions in Python, this dataset was also used in Codex evaluation along with HumanEval.