Spaces:

codeparrot
/

code-generation-models

Running

loubnabnl HF staff commited on May 27, 2022

Commit

5b2e5a5

1 Parent(s): aae13ce

update

Files changed (1) hide show

datasets/polycoder.txt CHANGED Viewed

@@ -1,4 +1,4 @@
-[PolyCoder paper](https://arxiv.org/pdf/2202.13169v3.pdf) gives a nice comparison of existing code models. The model was trained on **254GB** of data, after preprocessing, consisting of popular repositories for 12 popular programming languages with at least 50 stars from GitHub in October 2021. The data used the following preprocessing:
 - Exact match deduplication
 - Filtering:
     - Average line length < 100 tokens

+[PolyCoder paper](https://arxiv.org/pdf/2202.13169v3.pdf) gives a nice comparison of existing code models. The authors also trained a code generation model on **254GB** of data, after preprocessing, consisting of popular repositories for 12 popular programming languages with at least 50 stars from GitHub in October 2021. The data used the following preprocessing:
 - Exact match deduplication
 - Filtering:
     - Average line length < 100 tokens