training - essentially I only wanted to code python.

#1
by wws911 - opened

Hi there I would like to only code python using this model what would be the most efficient way to reduce the file size and also at the same time to train and find tune it for strictly Python

Owner

If you want to fine tune strictly for Python, then you just need to change the dataset.
You can check out: https://huggingface.co/datasets/bigcode/the-stack-smol

For Python:
load_dataset("bigcode/the-stack-smol", data_dir="data/python")

DatasetDict({
train: Dataset({
features: ['content', 'avg_line_length', 'max_line_length', 'alphanum_fraction', 'licenses', 'repository_name', 'path', 'size', 'lang'],
num_rows: 10000
})
})

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment