Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,28 @@
|
|
1 |
---
|
2 |
license: mit
|
|
|
|
|
|
|
|
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: mit
|
3 |
+
datasets:
|
4 |
+
- HuggingFaceFW/fineweb-edu
|
5 |
+
language:
|
6 |
+
- en
|
7 |
+
pipeline_tag: text-generation
|
8 |
---
|
9 |
+
Using training code in github.com/karpathy/build-nanogpt, the model is trained on the HuggingFaceFW/fineweb-edu dataset with 10B tokens, which took 112 hours on my 4070Ti GPU.
|
10 |
+
The training context is set to 512 tokens, with final validation loss 2.91 and Hellaswag eval 0.3465.
|
11 |
+
|
12 |
+
With n_layer=20,n_head=14 and n_embd=896, model's architecture is very similar to GPT-2, except that I modify the span of the mlp from 2x to 5x.
|
13 |
+
|
14 |
+
|
15 |
+
**Usage:**
|
16 |
+
```python
|
17 |
+
import torch
|
18 |
+
from transformers import GPT2Tokenizer
|
19 |
+
from model import GPT,GPTConfig
|
20 |
+
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
|
21 |
+
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
|
22 |
+
model=GPT.from_pretrained('model.pt')
|
23 |
+
model=model.to(device)
|
24 |
+
prompt = "Hello, I'm a language model,"
|
25 |
+
input_ids = tokenizer(prompt, return_tensors="pt").input_ids
|
26 |
+
generated_ids=model.generate(input_ids.to(device))
|
27 |
+
print(tokenizer.decode(generated_ids))
|
28 |
+
```
|