YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

GPT-2 (125M) 4k tokens

Fine-tuned GPT2 Smallest model on The Pile with a token length of 4k. Weights are included and it follows Karpathy's nanoGPT implementation. The model has been trained for ~1 million iterations with increasing batch size, ending at 32k. The final loss is 3.9 which is probably due to 768 embedding size.

Downloads last month: 19

Safetensors

Model size

176M params

Tensor type

F32

BOOL

Inference Examples

Text Generation

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.