Update README.md
Browse files
README.md
CHANGED
@@ -8,7 +8,7 @@ language:
|
|
8 |
|
9 |
---
|
10 |
|
11 |
-
A modified GPT-2 model with only 25 million non-embedding params that outbenches GPT-2(124m), Pythia-70m/160m, and Cerebras-111m, it has ScaledSinusoidal position embeddings, embedding layernorm, no biases, and was trained on only 8 billion tokens of the SlimPajama dataset at home on 2xA6000.
|
12 |
|
13 |
| model | avg | arc | hellaswag | mmlu | truthfulqa |
|
14 |
| --- | --- | --- | --- | --- | --- |
|
|
|
8 |
|
9 |
---
|
10 |
|
11 |
+
A modified GPT-2 model with only 25 million non-embedding params that outbenches GPT-2(124m), Pythia-70m/160m, and Cerebras-111m, it has ScaledSinusoidal position embeddings, embedding layernorm, no biases, and was trained on only 8 billion tokens of the SlimPajama dataset at home on 2xA6000. (On the graphic it's mis-labeled as cramp-41m)
|
12 |
|
13 |
| model | avg | arc | hellaswag | mmlu | truthfulqa |
|
14 |
| --- | --- | --- | --- | --- | --- |
|