Spaces:

bigcode
/

README

Running

App Files Files Community

lvwerra HF staff commited on Aug 24, 2023

Commit

181a039

•

1 Parent(s): f137b12

Update README.md

Browse files

Files changed (1) hide show

README.md +40 -34

README.md CHANGED Viewed

@@ -20,7 +20,6 @@ pinned: false
 BigCode is an open scientific collaboration working on responsible training of large language models for coding applications. You can find more information on the main [website](https://www.bigcode-project.org/) or follow Big Code on [Twitter](https://twitter.com/BigCodeProject). In this organization you can find the artefacts of this collaboration: **StarCoder**, a state-of-the-art language model for code, **OctoPack**, artifacts for instruction tuning large code models, **The Stack**, the largest available pretraining dataset with perimssive code, and **SantaCoder**, a 1.1B parameter model for code.
 ---
 <details>
   <summary>
     <b><font size="+1">💫StarCoder</font></b>
@@ -50,39 +49,46 @@ BigCode is an open scientific collaboration working on responsible training of l
   - [StarCoder Search](https://huggingface.co/spaces/bigcode/search): Full-text search code in the pretraining dataset.
   - [StarCoder Membership Test](https://stack.dataportraits.org/): Blazing fast test if code was present in pretraining dataset.
 </details>
 ---
-## 🐙OctoPack
-OctoPack consists of data, evals & models relating to Code LLMs that follow human instructions.
-- [Paper](https://arxiv.org/abs/2308.07124): Research paper with details about all components of OctoPack.
-- [GitHub](https://github.com/bigcode-project/octopack): All code used for the creation of OctoPack.
-- [CommitPack](https://huggingface.co/datasets/bigcode/commitpack): 4TB of Git commits.
-- [Am I in the CommitPack](https://huggingface.co/spaces/bigcode/in-the-commitpack): Check if your code is in the CommitPack.
-- [CommitPackFT](https://huggingface.co/datasets/bigcode/commitpackft): 2GB of high-quality Git commits that resemble instructions.
-- [HumanEvalPack](https://huggingface.co/datasets/bigcode/humanevalpack): Benchmark for Code Fixing/Explaining/Synthesizing across Python/JavaScript/Java/Go/C++/Rust.
-- [OctoCoder](https://huggingface.co/bigcode/octocoder): Instruction tuned model of StarCoder by training on CommitPackFT.
-- [OctoCoder Demo](https://huggingface.co/spaces/bigcode/OctoCoder-Demo): Play with OctoCoder.
-- [OctoGeeX](https://huggingface.co/bigcode/octogeex): Instruction tuned model of CodeGeeX2 by training on CommitPackFT.
 ---
-## 📑The Stack
-The Stack is a 6.4TB of source code in 358 programming languages from permissive licenses.
-- [The Stack](https://huggingface.co/datasets/bigcode/the-stack): Exact deduplicated version of The Stack.
-- [The Stack dedup](https://huggingface.co/datasets/bigcode/the-stack-dedup): Near deduplicated version of The Stack (recommended for training).
-- [The Stack issues](https://huggingface.co/datasets/bigcode/the-stack-github-issues): Collection of GitHub issues.
-- [The Stack Metadata](https://huggingface.co/datasets/bigcode/the-stack-metadata): Metadata of the repositories in The Stack.
-- [Am I in the Stack](https://huggingface.co/spaces/bigcode/in-the-stack): Check if your data is in The Stack and request opt-out.
 ---
-## 🎅SantaCoder
-SantaCoder aka smol StarCoder: same architecture but only trained on Python, Java, JavaScript.
-- [SantaCoder](https://huggingface.co/bigcode/santacoder): SantaCoder Model.
-- [SantaCoder Demo](https://huggingface.co/spaces/bigcode/santacoder-demo): Write with SantaCoder.
-- [SantaCoder Search](https://huggingface.co/spaces/bigcode/santacoder-search): Search code in the pretraining dataset.
-- [SantaCoder License](https://huggingface.co/spaces/bigcode/license): The OpenRAIL license for SantaCoder.

 BigCode is an open scientific collaboration working on responsible training of large language models for coding applications. You can find more information on the main [website](https://www.bigcode-project.org/) or follow Big Code on [Twitter](https://twitter.com/BigCodeProject). In this organization you can find the artefacts of this collaboration: **StarCoder**, a state-of-the-art language model for code, **OctoPack**, artifacts for instruction tuning large code models, **The Stack**, the largest available pretraining dataset with perimssive code, and **SantaCoder**, a 1.1B parameter model for code.
 ---
 <details>
   <summary>
     <b><font size="+1">💫StarCoder</font></b>
   - [StarCoder Search](https://huggingface.co/spaces/bigcode/search): Full-text search code in the pretraining dataset.
   - [StarCoder Membership Test](https://stack.dataportraits.org/): Blazing fast test if code was present in pretraining dataset.
 </details>
 ---
+<details>
+  <summary>
+    <b><font size="+1">🐙OctoPack</font></b>
+  </summary>
+  OctoPack consists of data, evals & models relating to Code LLMs that follow human instructions.
+  - [Paper](https://arxiv.org/abs/2308.07124): Research paper with details about all components of OctoPack.
+  - [GitHub](https://github.com/bigcode-project/octopack): All code used for the creation of OctoPack.
+  - [CommitPack](https://huggingface.co/datasets/bigcode/commitpack): 4TB of Git commits.
+  - [Am I in the CommitPack](https://huggingface.co/spaces/bigcode/in-the-commitpack): Check if your code is in the CommitPack.
+  - [CommitPackFT](https://huggingface.co/datasets/bigcode/commitpackft): 2GB of high-quality Git commits that resemble instructions.
+  - [HumanEvalPack](https://huggingface.co/datasets/bigcode/humanevalpack): Benchmark for Code Fixing/Explaining/Synthesizing across Python/JavaScript/Java/Go/C++/Rust.
+  - [OctoCoder](https://huggingface.co/bigcode/octocoder): Instruction tuned model of StarCoder by training on CommitPackFT.
+  - [OctoCoder Demo](https://huggingface.co/spaces/bigcode/OctoCoder-Demo): Play with OctoCoder.
+  - [OctoGeeX](https://huggingface.co/bigcode/octogeex): Instruction tuned model of CodeGeeX2 by training on CommitPackFT.
+</details>
 ---
+<details>
+  <summary>
+    <b><font size="+1">📑The Stack</font></b>
+  </summary>
+  The Stack is a 6.4TB of source code in 358 programming languages from permissive licenses.
+  - [The Stack](https://huggingface.co/datasets/bigcode/the-stack): Exact deduplicated version of The Stack.
+  - [The Stack dedup](https://huggingface.co/datasets/bigcode/the-stack-dedup): Near deduplicated version of The Stack (recommended for training).
+  - [The Stack issues](https://huggingface.co/datasets/bigcode/the-stack-github-issues): Collection of GitHub issues.
+  - [The Stack Metadata](https://huggingface.co/datasets/bigcode/the-stack-metadata): Metadata of the repositories in The Stack.
+  - [Am I in the Stack](https://huggingface.co/spaces/bigcode/in-the-stack): Check if your data is in The Stack and request opt-out.
+</details>
 ---
+<details>
+  <summary>
+    <b><font size="+1">🎅SantaCoder</font></b>
+  </summary>
+  SantaCoder aka smol StarCoder: same architecture but only trained on Python, Java, JavaScript.
+  - [SantaCoder](https://huggingface.co/bigcode/santacoder): SantaCoder Model.
+  - [SantaCoder Demo](https://huggingface.co/spaces/bigcode/santacoder-demo): Write with SantaCoder.
+  - [SantaCoder Search](https://huggingface.co/spaces/bigcode/santacoder-search): Search code in the pretraining dataset.
+  - [SantaCoder License](https://huggingface.co/spaces/bigcode/license): The OpenRAIL license for SantaCoder.
+</details>