update datasets
Browse files- datasets/github_code.txt +1 -1
datasets/github_code.txt
CHANGED
@@ -1,4 +1,4 @@
|
|
1 |
-
We also released [Github code dataset](https://huggingface.co/datasets/lvwerra/github-code), a 1TB of code data from Github repositories in 32 programming languages. The dataset can be loaded in a streaming mode if you don't want to download it because of memory issues, this will create an iterable dataset:
|
2 |
|
3 |
```python
|
4 |
from datasets import load_dataset
|
|
|
1 |
+
We also released [Github code dataset](https://huggingface.co/datasets/lvwerra/github-code), a 1TB of code data from Github repositories in 32 programming languages. It was created from the public GitHub dataset on Google [BigQuery](https://cloud.google.com/blog/topics/public-datasets/github-on-bigquery-analyze-all-the-open-source-code). The dataset can be loaded in a streaming mode if you don't want to download it because of memory issues, this will create an iterable dataset:
|
2 |
|
3 |
```python
|
4 |
from datasets import load_dataset
|