Spaces:
Running
Running
Update README.md
Browse files
README.md
CHANGED
@@ -7,4 +7,11 @@ sdk: static
|
|
7 |
pinned: false
|
8 |
---
|
9 |
|
10 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
7 |
pinned: false
|
8 |
---
|
9 |
|
10 |
+
# The Stack v2 Training Data
|
11 |
+
|
12 |
+
This organization contains the full datasets used to train StarCoder2:
|
13 |
+
|
14 |
+
- `the-stack-v2-train-full`: contains the training data with 600+ programming languages used to train StarCoder2-15B with the files concatenated per repository
|
15 |
+
- `the-stack-v2-train-full-files`: same as `the-stack-v2-train-full` but without repository concatenation which makes filtering files or licenses easier
|
16 |
+
- `the-stack-v2-train-smol`: contains the training data with 17 programming languages used to train StarCoder2-3B and 7B with the files concatenated per repository
|
17 |
+
- `the-stack-v2-train-smol-files`: same as `the-stack-v2-train-smol` but without repository concatenation which makes filtering files or licenses easier
|