lvwerra HF staff commited on
Commit
76aa86c
1 Parent(s): 78dbc2a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -1
README.md CHANGED
@@ -7,4 +7,11 @@ sdk: static
7
  pinned: false
8
  ---
9
 
10
- Edit this `README.md` markdown file to author your organization card.
 
 
 
 
 
 
 
 
7
  pinned: false
8
  ---
9
 
10
+ # The Stack v2 Training Data
11
+
12
+ This organization contains the full datasets used to train StarCoder2:
13
+
14
+ - `the-stack-v2-train-full`: contains the training data with 600+ programming languages used to train StarCoder2-15B with the files concatenated per repository
15
+ - `the-stack-v2-train-full-files`: same as `the-stack-v2-train-full` but without repository concatenation which makes filtering files or licenses easier
16
+ - `the-stack-v2-train-smol`: contains the training data with 17 programming languages used to train StarCoder2-3B and 7B with the files concatenated per repository
17
+ - `the-stack-v2-train-smol-files`: same as `the-stack-v2-train-smol` but without repository concatenation which makes filtering files or licenses easier