File size: 1,352 Bytes
edca0ee db3a076 edca0ee db3a076 |
1 2 3 4 5 6 7 8 |
This is an **interactive** blog, to give an overview of open-source language models for code generation. We present their pretraining datasets, model architecture and model evaluation along with examples and tips to use the 🤗 hub for this task. At the end of this blog, you will find a **demo** to test and compare code generation across these models ✨. ## Introduction The application of language models to code generation has sparked significant interest recently. You have probably heard of [Codex](https://arxiv.org/pdf/2107.03374v2.pdf), the model behind [Github Copilot](https://copilot.github.com/), or [AlphaCode](https://arxiv.org/pdf/2203.07814v1.pdf) for competition-level programming. These models aren't open-source, and it is hard to reproduce them with a limited budget and incomplete information about their training. The ML community has luckily contributed some code models to allow for further research. However, It can be easy to get lost between the models, so at Hugging Face we aim to democratize ML and centralize all information in the 🤗 hub to make the usage of open-source tools easier and more efficient. Code models aren't an exception, you can find all open-source models on the hub, with several code datasets and evaluation metrics. In this blog we will give an overview of these tools and how to use them. |