Yi3852
/

nanogpt-270M

Text Generation

Model card Files Files and versions Community

Yi3852 commited on Sep 16, 2024

Commit

c00ce5f

·

verified ·

1 Parent(s): 6669548

Update README.md

Files changed (1) hide show

README.md +25 -0

README.md CHANGED Viewed

@@ -1,3 +1,28 @@
 ---
 license: mit
 ---

 ---
 license: mit
+datasets:
+- HuggingFaceFW/fineweb-edu
+language:
+- en
+pipeline_tag: text-generation
 ---
+Using training code in github.com/karpathy/build-nanogpt, the model is trained on the HuggingFaceFW/fineweb-edu dataset with 10B tokens, which took 112 hours on my 4070Ti GPU.
+The training context is set to 512 tokens, with final validation loss 2.91 and Hellaswag eval 0.3465.
+With n_layer=20,n_head=14 and n_embd=896, model's architecture is very similar to GPT-2, except that I modify the span of the mlp from 2x to 5x.
+**Usage:**
+```python
+import torch
+from transformers import GPT2Tokenizer
+from model import GPT,GPTConfig
+tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+model=GPT.from_pretrained('model.pt')
+model=model.to(device)
+prompt = "Hello, I'm a language model,"
+input_ids = tokenizer(prompt, return_tensors="pt").input_ids
+generated_ids=model.generate(input_ids.to(device))
+print(tokenizer.decode(generated_ids))
+```