Text Generation
Transformers
PyTorch
English
gpt2a
custom_code
crumb commited on
Commit
b96b864
1 Parent(s): 7140f59

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -8,7 +8,7 @@ language:
8
 
9
  ---
10
 
11
- A modified GPT-2 model with only 25 million non-embedding params that outbenches GPT-2(124m), Pythia-70m/160m, and Cerebras-111m, it has ScaledSinusoidal position embeddings, embedding layernorm, no biases, and was trained on only 8 billion tokens of the SlimPajama dataset at home on 2xA6000.
12
 
13
  | model | avg | arc | hellaswag | mmlu | truthfulqa |
14
  | --- | --- | --- | --- | --- | --- |
 
8
 
9
  ---
10
 
11
+ A modified GPT-2 model with only 25 million non-embedding params that outbenches GPT-2(124m), Pythia-70m/160m, and Cerebras-111m, it has ScaledSinusoidal position embeddings, embedding layernorm, no biases, and was trained on only 8 billion tokens of the SlimPajama dataset at home on 2xA6000. (On the graphic it's mis-labeled as cramp-41m)
12
 
13
  | model | avg | arc | hellaswag | mmlu | truthfulqa |
14
  | --- | --- | --- | --- | --- | --- |