|
This is an **interactive** blog that provides an overview of open-source language models for code generation. This post presents code datasets, model architectures and evaluations along with examples and tips to use the π€ Hub for this task. At the end of this blog, you will find a **demo** to test and compare code generation across these models directly in the browser! β¨. |
|
|
|
|
|
## Introduction |
|
|
|
The application of language models to code generation has sparked great interest recently. You have probably heard of [Codex](https://arxiv.org/pdf/2107.03374v2.pdf), the model behind [Github Copilot](https://copilot.github.com/), or [AlphaCode](https://arxiv.org/pdf/2203.07814v1.pdf) for competition-level programming. These models aren't open-source, and it is hard to reproduce them with a limited budget and incomplete information about their training. The ML community has luckily contributed some code models to allow for further research. |
|
|
|
However, it can be easy to get lost between models. At Hugging Face we aim to democratize ML and centralize all information in the π€ ecosystem to make the usage of open-source tools easier and more efficient. Code models aren't an exception, you can find all open-source models on the Hub, with several code datasets and evaluation metrics. In this blog we will give an overview of these tools and how to use them. |