---
{}
---
# CodeParrot

This is a small version of the CodeParrot tokenizer trained on the [CodeParrot Python code dataset](https://huggingface.co/datasets/transformersbook/codeparrot). The tokenizer is trained in Chapter 10: Training Transformers from Scratch in the [NLP with Transformers book](https://learning.oreilly.com/library/view/natural-language-processing/9781098103231/). You can find the full code in the accompanying [Github repository](https://github.com/nlp-with-transformers/notebooks/blob/main/10_transformers-from-scratch.ipynb).