Edit model card

nanoGPT - Character-Level Shakespeare - Tied Weights

Small character-level, GPT-style language model trained on the works of Shakespeare using Andrej Karpathy's nanoGPT repo from my project LLMs Universally Learn a Feature Representing Token Frequency / Rarity.


This model has two versions:

  1. With tied embedding / unembedding weights (in true GPT fashion) - THIS PAGE
  2. Without tied embedding / unembedding weights


The model can be loaded using AutoModel from Hugging Face's transformers package:

>>> from transformers import AutoModel
>>> model = AutoModel.from_pretrained("sosier/nanoGPT-shakespeare-char-tied-weights", trust_remote_code=True)
>>> model
number of parameters: 10.65M

  (transformer): ModuleDict(
    (wte): Embedding(65, 384)
    (wpe): Embedding(256, 384)
    (drop): Dropout(p=0.2, inplace=False)
    (h): ModuleList(
      (0-5): 6 x Block(
        (ln_1): LayerNorm()
        (attn): CausalSelfAttention(
          (c_attn): Linear(in_features=384, out_features=1152, bias=False)
          (c_proj): Linear(in_features=384, out_features=384, bias=False)
          (attn_dropout): Dropout(p=0.2, inplace=False)
          (resid_dropout): Dropout(p=0.2, inplace=False)
        (ln_2): LayerNorm()
        (mlp): MLP(
          (c_fc): Linear(in_features=384, out_features=1536, bias=False)
          (gelu): GELU(approximate='none')
          (c_proj): Linear(in_features=1536, out_features=384, bias=False)
          (dropout): Dropout(p=0.2, inplace=False)
    (ln_f): LayerNorm()
  (lm_head): Linear(in_features=384, out_features=65, bias=False)

Training Data / Token Counts

The training data token counts can be found on my GitHub repo here and can be loaded using the instructions here.


As a character-level model the tokenizer is simply a mapping for each character to its token id as given in the token counts (see section above).

Downloads last month
Inference Examples
Inference API (serverless) does not yet support model repos that contain custom code.