Yi3852 commited on
Commit
c00ce5f
·
verified ·
1 Parent(s): 6669548

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +25 -0
README.md CHANGED
@@ -1,3 +1,28 @@
1
  ---
2
  license: mit
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
+ datasets:
4
+ - HuggingFaceFW/fineweb-edu
5
+ language:
6
+ - en
7
+ pipeline_tag: text-generation
8
  ---
9
+ Using training code in github.com/karpathy/build-nanogpt, the model is trained on the HuggingFaceFW/fineweb-edu dataset with 10B tokens, which took 112 hours on my 4070Ti GPU.
10
+ The training context is set to 512 tokens, with final validation loss 2.91 and Hellaswag eval 0.3465.
11
+
12
+ With n_layer=20,n_head=14 and n_embd=896, model's architecture is very similar to GPT-2, except that I modify the span of the mlp from 2x to 5x.
13
+
14
+
15
+ **Usage:**
16
+ ```python
17
+ import torch
18
+ from transformers import GPT2Tokenizer
19
+ from model import GPT,GPTConfig
20
+ tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
21
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
22
+ model=GPT.from_pretrained('model.pt')
23
+ model=model.to(device)
24
+ prompt = "Hello, I'm a language model,"
25
+ input_ids = tokenizer(prompt, return_tensors="pt").input_ids
26
+ generated_ids=model.generate(input_ids.to(device))
27
+ print(tokenizer.decode(generated_ids))
28
+ ```