training - essentially I only wanted to code python.
#1
by
wws911
- opened
Hi there I would like to only code python using this model what would be the most efficient way to reduce the file size and also at the same time to train and find tune it for strictly Python
If you want to fine tune strictly for Python, then you just need to change the dataset.
You can check out: https://huggingface.co/datasets/bigcode/the-stack-smol
For Python:
load_dataset("bigcode/the-stack-smol", data_dir="data/python")
DatasetDict({
train: Dataset({
features: ['content', 'avg_line_length', 'max_line_length', 'alphanum_fraction', 'licenses', 'repository_name', 'path', 'size', 'lang'],
num_rows: 10000
})
})